Proceedings of Machine Learning Research

Proceedings of Machine Learning Research Proceedings of the 1st Conference on Fairness, Accountability and Transparency Held in New York, NY, USA on 23-24 February 2018 Published as Volume 81 by the Proceedings of Machine Learning Research on 21 January 2018. Volume Edited by: Sorelle A. Friedler Christo Wilson Series Editors: Neil D. Lawrence Mark Reid https://proceedings.mlr.press/v81/ Sat, 01 Apr 2023 09:37:10 +0000 Sat, 01 Apr 2023 09:37:10 +0000 Jekyll v3.9.3 Keynote 1 Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/sweeney18a.html https://proceedings.mlr.press/v81/sweeney18a.html Potential for Discrimination in Online Targeted Advertising Recently, online targeted advertising platforms like Facebook have been criticized for allowing advertisers to discriminate against users belonging to sensitive groups, i.e., to exclude users belonging to a certain race or gender from receiving their ads. Such criticisms have led, for instance, Facebook to disallow the use of attributes such as ethnic affinity from being used by advertisers when targeting ads related to housing or employment or financial services. In this paper, we show that such measures are far from sufficient and that the problem of discrimination in targeted advertising is much more pernicious. We argue that discrimination measures should be based on the targeted population and not on the attributes used for targeting. We systematically investigate the different targeting methods offered by Facebook for their ability to enable discriminatory advertising. We show that a malicious advertiser can create highly discriminatory ads without using sensitive attributes. Our findings call for exploring fundamentally new methods for mitigating discrimination in online targeted advertising. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/speicher18a.html https://proceedings.mlr.press/v81/speicher18a.html “Meaningful Information” and the Right to Explanation There is no single, neat statutory provision labeled the “right to explanation” in Europe’s new General Data Protection Regulation (GDPR). But nor is such a right illusory. Responding to two prominent papers that, in turn, conjure and critique the right to explanation in the context of automated decision-making, we advocate a return to the text of the GDPR. Articles 13–15 provide rights to “meaningful information about the logic involved” in automated decisions. This is a right to explanation, whether one uses the phrase or not. The right to explanation should be interpreted functionally, flexibly, and should, at a minimum, enable a data subject to exercise his or her rights under the GDPR and human rights law. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/selbst18a.html https://proceedings.mlr.press/v81/selbst18a.html Interpretable Active Learning Active learning has long been a topic of study in machine learning. However, as increasingly complex and opaque models have become standard practice, the process of active learning, too, has become more opaque. There has been little investigation into interpreting what specific trends and patterns an active learning strategy may be exploring. This work expands on the Local Interpretable Model-agnostic Explanations framework (LIME) to provide explanations for active learning recommendations. We demonstrate how LIME can be used to generate locally faithful explanations for an active learning strategy, and how these explanations can be used to understand how different models and datasets explore a problem space over time. These explanations can also be used to generate batches based on common sources of uncertainty. These regions of common uncertainty can be useful for understanding a model’s current weaknesses. In order to quantify the per-subgroup differences in how an active learning strategy queries spatial regions, we introduce a notion of uncertainty bias (based on disparate impact) to measure the discrepancy in the confidence for a model’s predictions between one subgroup and another. Using the uncertainty bias measure, we show that our query explanations accurately reflect the subgroup focus of the active learning queries, allowing for an interpretable explanation of what is being learned as points with similar sources of uncertainty have their uncertainty bias resolved. We demonstrate that this technique can be applied to track uncertainty bias over user-defined clusters or automatically generated clusters based on the source of uncertainty. We also measure how the choice of initial labeled examples effects groups over time. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/phillips18a.html https://proceedings.mlr.press/v81/phillips18a.html The cost of fairness in binary classification Binary classifiers are often required to possess fairness in the sense of not overly discriminating with respect to a feature deemed sensitive e.g. race. We study the inherent tradeoffs in learning classifiers with a fairness constraint in the form of two questions: what is the best accuracy we can expect for a given level of fairness?, and what is the nature of these optimal fairness-aware classifiers? To answer these questions, we provide three main contributions. First, we relate two existing fairness measures to cost-sensitive risks. Second, we show that for such cost-sensitive fairness measures, the optimal classifier is an instance-dependent thresholding of the class-probability function. Third, we relate the tradeoff between accuracy and fairness to the alignment between the target and sensitive features’ class-probabilities. A practical implication of our analysis is a simple approach to the fairness-aware problem which involves suitably thresholding class-probability estimates. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/menon18a.html https://proceedings.mlr.press/v81/menon18a.html Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying such stereotypes and bias in Hindi movie industry (\it Bollywood) and propose an algorithm to remove these stereotypes from text. We analyze movie plots and posters for all movies released since 1970. The gender bias is detected by semantic modeling of plots at sentence and intra-sentence level. Different features like occupation, introductions, associated actions and descriptions are captured to show the pervasiveness of gender bias and stereotype in movies. Using the derived semantic graph, we compute centrality of each character and observe similar bias there. We also show that such bias is not applicable for movie posters where females get equal importance even though their character has little or no impact on the movie plot. The silver lining is that our system was able to identify 30 movies over last 3 years where such stereotypes were broken. The next step, is to generate debiased stories. The proposed debiasing algorithm extracts gender biased graphs from unstructured piece of text in stories from movies and de-bias these graphs to generate plausible unbiased stories. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/madaan18a.html https://proceedings.mlr.press/v81/madaan18a.html Recommendation Independence This paper studies a recommendation algorithm whose outcomes are not influenced by specified information. It is useful in contexts potentially unfair decision should be avoided, such as job-applicant recommendations that are not influenced by socially sensitive information. An algorithm that could exclude the influence of sensitive information would thus be useful for job-matching with fairness. We call the condition between a recommendation outcome and a sensitive feature Recommendation Independence, which is formally defined as statistical independence between the outcome and the feature. Our previous independence-enhanced algorithms simply matched the means of predictions between sub-datasets consisting of the same sensitive value. However, this approach could not remove the sensitive information represented by the second or higher moments of distributions. In this paper, we develop new methods that can deal with the second moment, i.e., variance, of recommendation outcomes without increasing the computational complexity. These methods can more strictly remove the sensitive information, and experimental results demonstrate that our new algorithms can more effectively eliminate the factors that undermine fairness. Additionally, we explore potential applications for independence-enhanced recommendation, and discuss its relation to other concepts, such as recommendation diversity. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/kamishima18a.html https://proceedings.mlr.press/v81/kamishima18a.html Keynote 2 Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/hellman18a.html https://proceedings.mlr.press/v81/hellman18a.html Preface Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/friedler18a.html https://proceedings.mlr.press/v81/friedler18a.html Runaway Feedback Loops in Predictive Policing Predictive policing systems are increasingly used to determine how to allocate police across a city in order to best prevent crime. Discovered crime data (e.g., arrest counts) are used to help update the model, and the process is repeated. Such systems have been shown susceptible to runaway feedback loops, where police are repeatedly sent back to the same neighborhoods regardless of the true crime rate. In response, we develop a mathematical model of predictive policing that proves why this feedback loop occurs, show empirically that this model exhibits such problems, and demonstrate how to change the inputs to a predictive policing system (in a black-box manner) so the runaway feedback loop does not occur, allowing the true crime rate to be learned. Our results are quantitative: we can establish a link (in our model) between the degree to which runaway feedback causes problems and the disparity in crime rates between areas. Moreover, we can also demonstrate the way in which reported incidents of crime (those reported by residents) and discovered incidents of crime (i.e those directly observed by police officers dispatched as a result of the predictive policing algorithm) interact: in brief, while reported incidents can attenuate the degree of runaway feedback, they cannot entirely remove it without the interventions we suggest. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/ensign18a.html https://proceedings.mlr.press/v81/ensign18a.html All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes. For example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/ekstrand18b.html https://proceedings.mlr.press/v81/ekstrand18b.html Privacy for All: Ensuring Fair and Equitable Privacy Protections In this position paper, we argue for applying recent research on ensuring sociotechnical systems are fair and non-discriminatory to the privacy protections those systems may provide. Privacy literature seldom considers whether a proposed privacy scheme protects all persons uniformly, irrespective of membership in protected classes or particular risk in the face of privacy failure. Just as algorithmic decision-making systems may have discriminatory outcomes even without explicit or deliberate discrimination, so also privacy regimes may disproportionately fail to protect vulnerable members of their target population, resulting in disparate impact with respect to the effectiveness of privacy protections.We propose a research agenda that will illuminate this issue, along with related issues in the intersection of fairness and privacy, and present case studies that show how the outcomes of this research may change existing privacy and fairness research. We believe it is important to ensure that technologies and policies intended to protect the users and subjects of information systems provide such protection in an equitable fashion. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/ekstrand18a.html https://proceedings.mlr.press/v81/ekstrand18a.html Decoupled Classifiers for Group-Fair and Efficient Machine Learning When it is ethical and legal to use a sensitive attribute (such as gender or race) in machine learning systems, the question remains how to do so. We show that the naive application of machine learning algorithms using sensitive attributes leads to an inherent tradeoff in accuracy between groups. We provide a simple and efficient decoupling technique, that can be added on top of any black-box machine learning algorithm, to learn different classifiers for different groups. Transfer learning is used to mitigate the problem of having too little data on any one group. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/dwork18a.html https://proceedings.mlr.press/v81/dwork18a.html Mixed Messages? The Limits of Automated Social Media Content Analysis Governments and companies are turning to automated tools to make sense of what people post on social media. Policymakers routinely call for social media companies to identify and take down hate speech, terrorist propaganda, harassment, “fake news” or disinformation. Other policy proposals have focused on mining social media to inform law enforcement and immigration decisions. But these proposals wrongly assume that automated technology can accomplish on a large scale the kind of nuanced analysis that humans can do on a small scale. Today’s tools for analyzing social media text have limited ability to parse the meaning of human communication or detect the intent of the speaker. A knowledge gap exists between data scientists studying natural language processing (NLP) and policymakers advocating for wide adoption of automated social media analysis and moderation. Policymakers must understand the capabilities and limits of NLP before endorsing or adopting automated content analysis tools, particularly for making decisions that affect fundamental rights or access to government benefits. Without proper safeguards, these tools can facilitate overbroad censorship and biased enforcement of laws or terms of service. This paper draws on existing research to explain the capabilities and limitations of text classifiers for social media posts and other online content. It is aimed at helping researchers and technical experts address the gaps in policymakers’ knowledge about what is possible with automated text analysis. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/duarte18a.html https://proceedings.mlr.press/v81/duarte18a.html Discrimination in Online Advertising: A Multidisciplinary Inquiry We explore ways in which discrimination may arise in the targeting of job-related advertising, noting the potential for multiple parties to contribute to its occurrence. We then examine the statutes and case law interpreting the prohibition on advertisements that indicate a preference based on protected class, and consider its application to online advertising. We focus on its interaction with Section 230 of the Communications Decency Act, which provides interactive computer services with immunity for providing access to information created by a third party. We argue that such services can lose that immunity if they target ads toward or away from protected classes without explicit instructions from advertisers to do so. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/datta18a.html https://proceedings.mlr.press/v81/datta18a.html A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions Every year there are more than 3.6 million referrals made to child protection agencies across the US. The practice of screening calls is left to each jurisdiction to follow local practices and policies, potentially leading to large variation in the way in which referrals are treated across the country. Whilst increasing access to linked administrative data is available, it is difficult for welfare workers to make systematic use of historical information about all the children and adults on a single referral call. Risk prediction models that use routinely collected administrative data can help call workers to better identify cases that are likely to result in adverse outcomes. However, the use of predictive analytics in the area of child welfare is contentious. There is a possibility that some communities—such as those in poverty or from particular racial and ethnic groups—will be disadvantaged by the reliance on government administrative data. On the other hand, these analytics tools can augment or replace human judgments, which themselves are biased and imperfect. In this paper we describe our work on developing, validating, fairness auditing, and deploying a risk prediction model in Allegheny County, Pennsylvania, USA. We discuss the results of our analysis to-date, and also highlight key problems and data bias issues that present challenges for model evaluation and deployment. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/chouldechova18a.html https://proceedings.mlr.press/v81/chouldechova18a.html Balanced Neighborhoods for Multi-sided Fairness in Recommendation Fairness has emerged as an important category of analysis for machine learning systems in some application areas. In extending the concept of fairness to recommender systems, there is an essential tension between the goals of fairness and those of personalization. However, there are contexts in which equity across recommendation outcomes is a desirable goal. It is also the case that in some applications fairness may be a multisided concept, in which the impacts on multiple groups of individuals must be considered. In this paper, we examine two different cases of fairness-aware recommender systems: consumer-centered and provider-centered. We explore the concept of a balanced neighborhood as a mechanism to preserve personalization in recommendation while enhancing the fairness of recommendation outcomes. We show that a modified version of the Sparse Linear Method (SLIM) can be used to improve the balance of user and item neighborhoods, with the result of achieving greater outcome fairness in real-world datasets with minimal loss in ranking performance. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/burke18a.html https://proceedings.mlr.press/v81/burke18a.html Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/buolamwini18a.html https://proceedings.mlr.press/v81/buolamwini18a.html Fairness in Machine Learning: Lessons from Political Philosophy What does it mean for a machine learning model to be ‘fair’, in terms which can be operationalised? Should fairness consist of ensuring everyone has an equal probability of obtaining some benefit, or should we aim instead to minimise the harms to the least advantaged? Can the relevant ideal be determined by reference to some alternative state of affairs in which a particular social pattern of discrimination does not exist? Various definitions proposed in recent literature make different assumptions about what terms like discrimination and fairness mean and how they can be defined in mathematical terms. Questions of discrimination, egalitarianism and justice are of significant interest to moral and political philosophers, who have expended significant efforts in formalising and defending these central concepts. It is therefore unsurprising that attempts to formalise ‘fairness’ in machine learning contain echoes of these old philosophical debates. This paper draws on existing work in moral and political philosophy in order to elucidate emerging debates about fair machine learning. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/binns18a.html https://proceedings.mlr.press/v81/binns18a.html Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment Actuarial risk assessments are frequently touted as a neutral way to counteract implicit bias and increase the fairness of decisions made at almost every juncture of the criminal justice system, from pretrial release to sentencing, parole and probation. In recent times these assessments have come under increased scrutiny, as critics claim that the statistical techniques underlying them might reproduce existing patterns of discrimination and historical biases that are reflected in the data. Much of this debate is centered around competing notions of fairness and predictive accuracy, which seek to problematize the use of variables that act as “proxies” for protected classes, such as race and gender. However, these debates fail to address the core ethical issue at hand - that current risk assessments are ill-equipped to support ethical punishment and rehabilitation practices in the criminal justice system, because they offer only a limited insight into the underlying drivers of criminal behavior. In this paper, we examine the prevailing paradigms of fairness currently under debate and propose an alternative methodology for identifying the underlying social and structural factors that drive criminal behavior. We argue that the core ethical debate surrounding the use of regression in risk assessments is not one of bias or accuracy. Rather, it’s one of purpose. If machine learning is operationalized merely in the service of predicting future crime, then it becomes difficult to break cycles of criminalization that are driven by the iatrogenic effects of the criminal justice system itself. We posit that machine learning should not be used for prediction, rather it should be used to surface covariates that are fed into a causal model for understanding the social, structural and psychological drivers of crime. We propose an alternative application of machine learning and causal inference away from predicting risk scores to risk mitigation. Sun, 21 Jan 2018 00:00:00 +0000 https://proceedings.mlr.press/v81/barabas18a.html https://proceedings.mlr.press/v81/barabas18a.html