All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness

Michael D. Ekstrand; Mucun Tian; Ion Madrazo Azpiazu; Jennifer D. Ekstrand; Oghenemaro Anuyah; David McNeill; Maria Soledad Pera

All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness

Michael D. Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D. Ekstrand, Oghenemaro Anuyah, David McNeill, Maria Soledad Pera

Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81:172-186, 2018.

Abstract

In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes. For example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v81-ekstrand18b,
  title = 	 {All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness},
  author = 	 {Ekstrand, Michael D. and Tian, Mucun and Azpiazu, Ion Madrazo and Ekstrand, Jennifer D. and Anuyah, Oghenemaro and McNeill, David and Pera, Maria Soledad},
  booktitle = 	 {Proceedings of the 1st Conference on Fairness, Accountability and Transparency},
  pages = 	 {172--186},
  year = 	 {2018},
  editor = 	 {Friedler, Sorelle A. and Wilson, Christo},
  volume = 	 {81},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--24 Feb},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v81/ekstrand18b/ekstrand18b.pdf},
  url = 	 {https://proceedings.mlr.press/v81/ekstrand18b.html},
  abstract = 	 {In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes. For example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.}
}

Endnote

%0 Conference Paper
%T All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness
%A Michael D. Ekstrand
%A Mucun Tian
%A Ion Madrazo Azpiazu
%A Jennifer D. Ekstrand
%A Oghenemaro Anuyah
%A David McNeill
%A Maria Soledad Pera
%B Proceedings of the 1st Conference on Fairness, Accountability and Transparency
%C Proceedings of Machine Learning Research
%D 2018
%E Sorelle A. Friedler
%E Christo Wilson	
%F pmlr-v81-ekstrand18b
%I PMLR
%P 172--186
%U https://proceedings.mlr.press/v81/ekstrand18b.html
%V 81
%X In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes. For example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.

APA

Ekstrand, M.D., Tian, M., Azpiazu, I.M., Ekstrand, J.D., Anuyah, O., McNeill, D. & Pera, M.S.. (2018). All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, in Proceedings of Machine Learning Research 81:172-186 Available from https://proceedings.mlr.press/v81/ekstrand18b.html.

Related Material

Download PDF