Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

Chiraag Kaushik, Ran Liu, Chi-Heng Lin, Amrit Khera, Matthew Y Jin, Wenrui Ma, Vidya Muthukumar, Eva L Dyer
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:23343-23366, 2024.

Abstract

Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pre-trained encoders, and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pre-trained features may have unknown biases that can be diagnosed through their spectra.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-kaushik24a, title = {Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance}, author = {Kaushik, Chiraag and Liu, Ran and Lin, Chi-Heng and Khera, Amrit and Jin, Matthew Y and Ma, Wenrui and Muthukumar, Vidya and Dyer, Eva L}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {23343--23366}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kaushik24a/kaushik24a.pdf}, url = {https://proceedings.mlr.press/v235/kaushik24a.html}, abstract = {Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pre-trained encoders, and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pre-trained features may have unknown biases that can be diagnosed through their spectra.} }
Endnote
%0 Conference Paper %T Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance %A Chiraag Kaushik %A Ran Liu %A Chi-Heng Lin %A Amrit Khera %A Matthew Y Jin %A Wenrui Ma %A Vidya Muthukumar %A Eva L Dyer %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-kaushik24a %I PMLR %P 23343--23366 %U https://proceedings.mlr.press/v235/kaushik24a.html %V 235 %X Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pre-trained encoders, and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pre-trained features may have unknown biases that can be diagnosed through their spectra.
APA
Kaushik, C., Liu, R., Lin, C., Khera, A., Jin, M.Y., Ma, W., Muthukumar, V. & Dyer, E.L.. (2024). Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:23343-23366 Available from https://proceedings.mlr.press/v235/kaushik24a.html.

Related Material