On Hedden's proof that machine learning fairness metrics are flawed
In: Inquiry: an interdisciplinary journal of philosophy and the social sciences, S. 1-20
ISSN: 1502-3923
47 Ergebnisse
Sortierung:
In: Inquiry: an interdisciplinary journal of philosophy and the social sciences, S. 1-20
ISSN: 1502-3923
In: https://ora.ox.ac.uk/objects/uuid:0c4cc51d-b2d3-4843-82ad-928e3b33e119
Western societies are marked by diverse and extensive biases and inequality that are unavoidably embedded in the data used to train machine learning. Algorithms trained on biased data will, without intervention, produce biased outcomes and increase the inequality experienced by historically disadvantaged groups. Recognising this problem, much work has emerged in recent years to test for bias in machine learning and AI systems using various fairness and bias metrics. Often these metrics address technical bias but ignore the underlying causes of inequality. In this paper we make three contributions. First, we assess the compatibility of fairness metrics used in machine learning against the aims and purpose of EU non-discrimination law. We show that the fundamental aim of the law is not only to prevent ongoing discrimination, but also to change society, policies, and practices to 'level the playing field' and achieve substantive rather than merely formal equality. Based on this, we then propose a novel classification scheme for fairness metrics in machine learning based on how they handle pre-existing bias and thus align with the aims of non-discrimination law. Specifically, we distinguish between 'bias preserving' and 'bias transforming' fairness metrics. Our classification system is intended to bridge the gap between non-discrimination law and decisions around how to measure fairness in machine learning and AI in practice. Finally, we show that the legal need for justification in cases of indirect discrimination can impose additional obligations on developers, deployers, and users that choose to use bias preserving fairness metrics when making decisions about individuals because they can give rise to prima facie discrimination. To achieve substantive equality in practice, and thus meet the aims of the law, we instead recommend using bias transforming metrics. To conclude, we provide concrete recommendations including a user-friendly checklist for choosing the most appropriate fairness metric for uses of machine ...
BASE
In: West Virginia Law Review, Band 123, Heft 3
SSRN
SSRN
Working paper
SSRN
In: AI and ethics
ISSN: 2730-5961
AbstractTo monitor and prevent bias in AI systems, we can use a wide range of (statistical) fairness measures. However, it is mathematically impossible to optimize all of these measures at the same time. In addition, optimizing a fairness measure often greatly reduces the accuracy of the system (Kozodoi et al., Eur J Oper Res 297:1083–1094, 2022). As a result, we need a substantive theory that informs us how to make these decisions and for what reasons. I show that by using Rawls' notion of justice as fairness, we can create a basis for navigating fairness measures and the accuracy trade-off. In particular, this leads to a principled choice focusing on both the most vulnerable groups and the type of fairness measure that has the biggest impact on that group. This also helps to close part of the gap between philosophical accounts of distributive justice and the fairness literature that has been observed by (Kuppler et al. Distributive justice and fairness metrics in automated decision-making: How much overlap is there? arXiv preprint arXiv:2105.01441, 2021), and to operationalise the value of fairness.
In: AI and ethics
ISSN: 2730-5961
AbstractArtificial Intelligence, Machine Learning, Statistical Modeling and Predictive Analytics have been widely used in various industries for a long time. More recently, AI Model Governance including AI Ethics has received significant attention from academia, industry, and regulatory agencies. To minimize potential unjustified treatment disfavoring individuals based on demographics, an increasingly critical task is to assess group fairness through some established metrics. Many commercial and open-source tools are now available to support the computations of these fairness metrics. However, this area is largely based on rules, e.g., metrics within a prespecified range would be considered satisfactory. These metrics are statistical estimates and are often based on limited sample data and therefore subject to sampling variability. For instance, if a fairness criterion is barely met or missed, it is often uncertain if it should be a "pass" or "failure," if the sample size is not large. This is where statistical science can help. Specifically, statistical hypothesis testing enables us to determine whether the sample data can support a particular hypothesis (e.g., falling within an acceptable range) or the observations may have happened by chance. Drawing upon the bioequivalence literature from medicine and advanced hypothesis testing in statistics, we propose a practical statistical significance testing method to enhance the current rule-based process for model fairness testing and its associated power calculation, followed by an illustration with a realistic example.
Predictive algorithms are increasingly being deployed in a variety of settings to determine legal status. Algorithmic predictions have been used to determine provision of health care and social services, to allocate state resources, and to anticipate criminal behavior or activity. Further applications have been proposed to determine civil and criminal liability or to "personalize"legal default rules. Deployment of such artificial intelligence (AI) systems has properly raised questions of algorithmic bias, fairness, transparency, and due process. But little attention has been paid to the known sociological costs of using predictive algorithms to determine legal status. A large and growing social science literature teaches the effects of "algorithmic living,"documenting how humans interact with machine generated assessments. Many of these interactions are socially detrimental, and such corrosive effects are greatly amplified by the increasing speed and ubiquity of digitally automated algorithmic systems. In this Article I link the sociological and legal analysis of AI, highlighting the reflexive social processes that are engaged by algorithmic metrics. This Article examines these overlooked social effects of predictive legal algorithms and contributes to the literature a vital fundamental but missing critique of such analytics. Specifically, this Article shows how the problematic social effects of algorithmic legal metrics extend far beyond the concerns about accuracy that have thus far dominated critiques of such metrics. Second, it demonstrates that corrective governance mechanisms such as enhanced due process or transparency will be inadequate to remedy such corrosive effects, and that some such remedies, such as transparency, may actually exacerbate the worst effects of algorithmic governmentality. Third, the Article shows that the application of algorithmic metrics to legal decisions aggravates the latent tensions between equity and autonomy in liberal institutions, undermining democratic values in a manner and on a scale not previously experienced by human societies. Illuminating these effects casts new light on the inherent social costs of AI metrics, particularly the perverse effects of deploying algorithms in legal systems.
BASE
Predictive algorithms are increasingly being deployed in a variety of settings to determine legal status. Algorithmic predictions have been used to determine provision of health care and social services, to allocate state resources, and to anticipate criminal behavior or activity. Further applications have been proposed to determine civil and criminal liability or to "personalize" legal default rules. Deployment of such artificial intelligence systems has properly raised questions of algorithmic bias, fairness, transparency, and due process. But little attention has been paid to the known sociological costs of using predictive algorithms to determine legal status. A large and growing social science literature teaches the effects of "algorithmic living," documenting how humans interact with machine generated assessments. Many of these interactions are socially detrimental, and such corrosive effects are greatly amplified by the increasing speed and ubiquity of digitally automated algorithmic systems. In this paper I link the sociological and legal analysis of AI, highlighting the reflexive social processes that are engaged by algorithmic metrics. This paper examines these overlooked social effects of predictive legal algorithms, and contributes to the literature a vital fundamental but missing critique of such analytics. Specifically, this paper shows how the problematic social effects of algorithmic legal metrics extend far beyond the concerns about accuracy that have thus far dominated critiques of such metrics. Second, it demonstrates that corrective governance mechanisms such as enhanced due process or transparency will be inadequate to remedy such corrosive effects, and that some such remedies, such as transparency, may actually exacerbate the worst effects of algorithmic governmentality. Third, the paper shows that the application of algorithmic metrics to legal decisions aggravates the latent tensions between equity and autonomy in liberal institutions, undermining democratic values in a manner and on a scale not previously experienced by human societies. Illuminating these effects casts new light on the inherent social costs of AI metrics, particularly the perverse effects of deploying algorithms in legal systems.
BASE
In: Philosophy & technology, Band 37, Heft 1
ISSN: 2210-5441
AbstractFairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call "fairness hacking" for the purpose of shrouding unfairness in algorithms. This impacts end-users who rely on learning algorithms, as well as the broader community interested in fair AI practices. We introduce two different categories of fairness hacking in reference to the established concept of p-hacking. The first category, intra-metric fairness hacking, describes the misuse of a particular metric by adding or removing sensitive attributes from the analysis. In this context, countermeasures that have been developed to prevent or reduce p-hacking can be applied to similarly prevent or reduce fairness hacking. The second category of fairness hacking is inter-metric fairness hacking. Inter-metric fairness hacking is the search for a specific fair metric with given attributes. We argue that countermeasures to prevent or reduce inter-metric fairness hacking are still in their infancy. Finally, we demonstrate both types of fairness hacking using real datasets. Our paper intends to serve as a guidance for discussions within the fair ML community to prevent or reduce the misuse of fairness metrics, and thus reduce overall harm from ML applications.
In: AI and ethics, Band 1, Heft 4, S. 529-544
ISSN: 2730-5961
AbstractThere is growing concern that decision-making informed by machine learning (ML) algorithms may unfairly discriminate based on personal demographic attributes, such as race and gender. Scholars have responded by introducing numerous mathematical definitions of fairness to test the algorithm, many of which are in conflict with one another. However, these reductionist representations of fairness often bear little resemblance to real-life fairness considerations, which in practice are highly contextual. Moreover, fairness metrics tend to be implemented within narrow and targeted fairness toolkits for algorithm assessments that are difficult to integrate into an algorithm's broader ethical assessment. In this paper, we derive lessons from ethical philosophy and welfare economics as they relate to the contextual factors relevant for fairness. In particular we highlight the debate around the acceptability of particular inequalities and the inextricable links between fairness, welfare and autonomy. We propose Key Ethics Indicators (KEIs) as a way towards providing a more holistic understanding of whether or not an algorithm is aligned to the decision-maker's ethical values.
Ranking algorithms are deployed widely to order a set of items in applications such as search engines, news feeds, and recommendation systems. Recent studies, however, have shown that, left unchecked, the output of ranking algorithms can result in decreased diversity in the type of content presented, promote stereotypes, and polarize opinions. In order to address such issues, we study the following variant of the traditional ranking problem when, in addition, there are fairness or diversity constraints. Given a collection of items along with 1) the value of placing an item in a particular position in the ranking, 2) the collection of sensitive attributes (such as gender, race, political opinion) of each item and 3) a collection of fairness constraints that, for each k, bound the number of items with each attribute that are allowed to appear in the top k positions of the ranking, the goal is to output a ranking that maximizes the value with respect to the original rank quality metric while respecting the constraints. This problem encapsulates various well-studied problems related to bipartite and hypergraph matching as special cases and turns out to be hard to approximate even with simple constraints. Our main technical contributions are fast exact and approximation algorithms along with complementary hardness results that, together, come close to settling the approximability of this constrained ranking maximization problem. Unlike prior work on the approximability of constrained matching problems, our algorithm runs in linear time, even when the number of constraints is (polynomially) large, its approximation ratio does not depend on the number of constraints, and it produces solutions with small constraint violations. Our results rely on insights about the constrained matching problem when the objective function satisfies certain properties that appear in common ranking metrics such as discounted cumulative gain (DCG), Spearman's rho or Bradley-Terry, along with the nested structure of fairness constraints.
BASE
In: AI and ethics
ISSN: 2730-5961
AbstractThe increasing use of algorithms in predictive policing has raised concerns regarding the potential amplification of societal biases. This study adopts a two-phase approach, encompassing a systematic review and the mitigation of age-related biases in predictive policing. Our systematic review identifies a variety of fairness strategies in existing literature, such as domain knowledge, likelihood function penalties, counterfactual reasoning, and demographic segmentation, with a primary focus on racial biases. However, this review also highlights significant gaps in addressing biases related to other protected attributes, including age, gender, and socio-economic status. Additionally, it is observed that police actions are a major contributor to model discrimination in predictive policing. To address these gaps, our empirical study focuses on mitigating age-related biases within the Chicago Police Department's Strategic Subject List (SSL) dataset used in predicting the risk of being involved in a shooting incident, either as a victim or an offender. We introduce Conditional Score Recalibration (CSR), a novel bias mitigation technique, alongside the established Class Balancing method. CSR involves reassessing and adjusting risk scores for individuals initially assigned moderately high-risk scores, categorizing them as low risk if they meet three criteria: no prior arrests for violent offenses, no previous arrests for narcotic offenses, and no involvement in shooting incidents. Our fairness assessment, utilizing metrics like Equality of Opportunity Difference, Average Odds Difference, and Demographic Parity, demonstrates that this approach significantly improves model fairness without sacrificing accuracy.
Künstliche Intelligenz (KI) und ihr Teilbereich des maschinellen Lernens (ML) ist ein wachsender Sektor in der europäischen Wirtschaft, sowohl in Bezug auf den ökonomischen Wertschöpfungsanteil als auch auf die gesellschaftlichen Auswirkungen. AutomatisierteEntscheidungen finden häufig in proprietären Black-Box-Systemen statt, bei denen der Quellcode nicht geprüft werden kann und die zugrunde liegenden Daten nicht von außenkontrolliert werden können. Diese Undurchsichtigkeit wirft ethische Fragen hinsichtlich Rechenschaftspflicht, Fairness und Nichtdiskriminierung, Transparenz und Interpretierbarkeit, sowie Wahrung der Privatsphäre und der Menschenrechte auf. Die Arbeit verknüpft drei bestehende Fairness-Konstrukte (Interpretierbarkeit, geschützte Variablen, und Metriken) mit entsprechenden Methoden der Transparenz und Rechenschaftspflicht. Anschließendwird die mögliche Diskriminierung von Bevölkerungsuntergruppen quantifiziert und die Art und das Ausmaß systematischer Unterschiede bei der Behandlung von Untergruppen auf der Grundlage von Klassifizierungsfehlermetriken ermittelt. Die Arbeit erweitert bestehende Forschungsergebnisse zur Fairness auf Gruppenebene und präsentiert Ergebnisse von Metriken auf Untergruppenebene, die auf das Problem der Fairness-Manipulation (eng. "fairness-gerrymandering") abzielen. Dies beschreibt den Effekt, dass scheinbare Fairness auf Gruppenebene durch starke Untergruppendiskriminierung erreicht wird. Die Hauptergebnisse sind, dass algorithmische Entscheidungsfindungssysteme nicht robust sind und unter-schiedliche Zuverlässigkeit für verschiedene Untergruppen einer Population bieten. Darüber hinaus können Metriken, bei denen nur die aggregierte Ebene berücksichtigt wird, keineFälle von Diskriminierung feststellen die insbesondere bei Minderheitengruppen in einer Bevölkerung auftreten. ; Artificial intelligence (AI) and its subfield of machine learning (ML) is a growing sector in the European economy, both in economic value and societal impact. Automated decisions often happen in proprietary black-box systems, where the source code cannot be inspected and the underlying data cannot be controlled from the outside. This opacity raises ethical issues of accountability, fairness and non-discrimination, transparency and interpretability, and ofupholding privacy and human rights. The thesis links three existing constructs of fairness (interpretability, protected variables, and metrics) with their corresponding mode of transparency and means of accountable fairness. It proceeds to quantify possible discrimination on population subgroups, and to identify the nature and extent of systematic differences in treatment of subgroups based on classification-error metrics. The thesis expands on existingresearch on group-level fairness and presents results of subgroup-level metrics aim to address the problem of fairness gerrymandering, the effect that apparent fairness on a group-level is achieved by strong subgroup discrimination. The main findings are that algorithmic decision making systems are not robust, and provide different levels of consistency and reliability for different subgroups of a population. Additionally, evaluation metrics that observe only the aggregate level fail to pick up cases of discrimination that especially occur in minority groups in a population. ; Christopher Emanuel Kittel ; Zusammenfassungen in Deutsch und Englisch ; Abweichender Titel laut Übersetzung des Verfassers/der Verfasserin ; Karl-Franzens-Universität Graz, Masterarbeit, 2019 ; (VLID)3610280
BASE
In: AI & society: the journal of human-centred systems and machine intelligence, Band 39, Heft 5, S. 2507-2523
ISSN: 1435-5655
AbstractThe use of technologies in personnel selection has come under increased scrutiny in recent years, revealing their potential to amplify existing inequalities in recruitment processes. To date, however, there has been a lack of comprehensive assessments of respective discriminatory potentials and no legal or practical standards have been explicitly established for fairness auditing. The current proposal of the Artificial Intelligence Act classifies numerous applications in personnel selection and recruitment as high-risk technologies, and while it requires quality standards to protect the fundamental rights of those involved, particularly during development, it does not provide concrete guidance on how to ensure this, especially once the technologies are commercially available. We argue that comprehensive and reliable auditing of personnel selection technologies must be contextual, that is, embedded in existing processes and based on real data, as well as participative, involving various stakeholders beyond technology vendors and customers, such as advocacy organizations and researchers. We propose an architectural draft that employs a data trustee to provide independent, fiduciary management of personal and corporate data to audit the fairness of technologies used in personnel selection. Drawing on a case study conducted with two state-owned companies in Berlin, Germany, we discuss challenges and approaches related to suitable fairness metrics, operationalization of vague concepts such as migration* and applicable legal foundations that can be utilized to overcome the fairness-privacy-dilemma arising from uncertainties associated with current laws. We highlight issues that require further interdisciplinary research to enable a prototypical implementation of the auditing concept in the mid-term.