Mixed Models with Multiple Instance Learning

Jan P. Engelmann, Alessandro Palma, Jakub M Tomczak, Fabian Theis, Francesco Paolo Casale
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3664-3672, 2024.

Abstract

Predicting patient features from single-cell data can help identify cellular states implicated in health and disease. Linear models and average cell type expressions are typically favored for this task for their efficiency and robustness, but they overlook the rich cell heterogeneity inherent in single-cell data. To address this gap, we introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL), upholding the advantages of linear models while modeling cell state heterogeneity. By leveraging predefined cell embeddings, MixMIL enhances computational efficiency and aligns with recent advancements in single-cell representation learning. Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets, uncovering new associations and elucidating biological mechanisms across different domains.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-p-engelmann24a, title = { Mixed Models with Multiple Instance Learning }, author = {P. Engelmann, Jan and Palma, Alessandro and M Tomczak, Jakub and Theis, Fabian and Paolo Casale, Francesco}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3664--3672}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/p-engelmann24a/p-engelmann24a.pdf}, url = {https://proceedings.mlr.press/v238/p-engelmann24a.html}, abstract = { Predicting patient features from single-cell data can help identify cellular states implicated in health and disease. Linear models and average cell type expressions are typically favored for this task for their efficiency and robustness, but they overlook the rich cell heterogeneity inherent in single-cell data. To address this gap, we introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL), upholding the advantages of linear models while modeling cell state heterogeneity. By leveraging predefined cell embeddings, MixMIL enhances computational efficiency and aligns with recent advancements in single-cell representation learning. Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets, uncovering new associations and elucidating biological mechanisms across different domains. } }
Endnote
%0 Conference Paper %T Mixed Models with Multiple Instance Learning %A Jan P. Engelmann %A Alessandro Palma %A Jakub M Tomczak %A Fabian Theis %A Francesco Paolo Casale %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-p-engelmann24a %I PMLR %P 3664--3672 %U https://proceedings.mlr.press/v238/p-engelmann24a.html %V 238 %X Predicting patient features from single-cell data can help identify cellular states implicated in health and disease. Linear models and average cell type expressions are typically favored for this task for their efficiency and robustness, but they overlook the rich cell heterogeneity inherent in single-cell data. To address this gap, we introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL), upholding the advantages of linear models while modeling cell state heterogeneity. By leveraging predefined cell embeddings, MixMIL enhances computational efficiency and aligns with recent advancements in single-cell representation learning. Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets, uncovering new associations and elucidating biological mechanisms across different domains.
APA
P. Engelmann, J., Palma, A., M Tomczak, J., Theis, F. & Paolo Casale, F.. (2024). Mixed Models with Multiple Instance Learning . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3664-3672 Available from https://proceedings.mlr.press/v238/p-engelmann24a.html.

Related Material