MAM: Multinomial Attention Masking for Foundation Models on Sparse Single-Cell RNA-seq Data

Amirreza Naziri; Arash Asgari; Aijun An; Eleftherios Sachlos; Laleh Seyyed-Kalantari

MAM: Multinomial Attention Masking for Foundation Models on Sparse Single-Cell RNA-seq Data

Amirreza Naziri, Arash Asgari, Aijun An, Eleftherios Sachlos, Laleh Seyyed-Kalantari

Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:295-311, 2026.

Abstract

Single-cell RNA sequencing (scRNA-seq) has transformed biology by enabling the measurement of gene expression across millions of individual cells, revealing cellular heterogeneity that underlies development, disease progression, and treatment response. This has made scRNA-seq a central data modality in modern biology and drug discovery. Recently, transformer-based foundation models (FMs) have shown strong potential for scRNA-seq analysis, but they often rely on random masking during training. Due to the extreme sparsity of scRNA-seq datasets, conventional uniform masking samples genes without considering their biological importance. In this work, we propose Multinomial Attention Masking (MAM), a module that learns which gene positions are most informative to mask at each training step. We define a set of trainable latent vectors that attend over gene embeddings to produce attention maps, from which a multinomial sampler selects the highest-scoring positions for masking. We show MAM improves FMs pretraining performance and consistently outperforms uniform masking on cell-type classification tasks, while adding negligible computational overhead. Our work benefits researchers building FMs for sparse data and those rely on accurate scRNA-seq analysis to study cell types and disease.

Cite this Paper

BibTeX

@InProceedings{pmlr-v333-naziri26a,
  title = 	 {MAM: Multinomial Attention Masking for Foundation Models on Sparse Single-Cell RNA-seq Data},
  author =       {Naziri, Amirreza and Asgari, Arash and An, Aijun and Sachlos, Eleftherios and Seyyed-Kalantari, Laleh},
  booktitle = 	 {Proceedings of the 7th Conference on Health, Inference, and Learning},
  pages = 	 {295--311},
  year = 	 {2026},
  editor = 	 {Healey, Elizabeth and Fries, Jason and Pollard, Tom and Tang, Shengpu and Zink, Anna and Hartvigsen, Tom and Agrawal, Monica and Finlayson, Sam and Glicksberg, Benjamin and Beaulieu-Jones, Brett and Wang, Kai and Fontalvo, Daseyra and Sarker, Tasmie and Chen, Irene and Alsentzer, Emily},
  volume = 	 {333},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--30 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v333/main/assets/naziri26a/naziri26a.pdf},
  url = 	 {https://proceedings.mlr.press/v333/naziri26a.html},
  abstract = 	 {Single-cell RNA sequencing (scRNA-seq) has transformed biology by enabling the measurement of gene expression across millions of individual cells, revealing cellular heterogeneity that underlies development, disease progression, and treatment response. This has made scRNA-seq a central data modality in modern biology and drug discovery. Recently, transformer-based foundation models (FMs) have shown strong potential for scRNA-seq analysis, but they often rely on random masking during training. Due to the extreme sparsity of scRNA-seq datasets, conventional uniform masking samples genes without considering their biological importance. In this work, we propose Multinomial Attention Masking (MAM), a module that learns which gene positions are most informative to mask at each training step. We define a set of trainable latent vectors that attend over gene embeddings to produce attention maps, from which a multinomial sampler selects the highest-scoring positions for masking. We show MAM improves FMs pretraining performance and consistently outperforms uniform masking on cell-type classification tasks, while adding negligible computational overhead. Our work benefits researchers building FMs for sparse data and those rely on accurate scRNA-seq analysis to study cell types and disease. }
}

Endnote

%0 Conference Paper
%T MAM: Multinomial Attention Masking for Foundation Models on Sparse Single-Cell RNA-seq Data
%A Amirreza Naziri
%A Arash Asgari
%A Aijun An
%A Eleftherios Sachlos
%A Laleh Seyyed-Kalantari
%B Proceedings of the 7th Conference on Health, Inference, and Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Elizabeth Healey
%E Jason Fries
%E Tom Pollard
%E Shengpu Tang
%E Anna Zink
%E Tom Hartvigsen
%E Monica Agrawal
%E Sam Finlayson
%E Benjamin Glicksberg
%E Brett Beaulieu-Jones
%E Kai Wang
%E Daseyra Fontalvo
%E Tasmie Sarker
%E Irene Chen
%E Emily Alsentzer	
%F pmlr-v333-naziri26a
%I PMLR
%P 295--311
%U https://proceedings.mlr.press/v333/naziri26a.html
%V 333
%X Single-cell RNA sequencing (scRNA-seq) has transformed biology by enabling the measurement of gene expression across millions of individual cells, revealing cellular heterogeneity that underlies development, disease progression, and treatment response. This has made scRNA-seq a central data modality in modern biology and drug discovery. Recently, transformer-based foundation models (FMs) have shown strong potential for scRNA-seq analysis, but they often rely on random masking during training. Due to the extreme sparsity of scRNA-seq datasets, conventional uniform masking samples genes without considering their biological importance. In this work, we propose Multinomial Attention Masking (MAM), a module that learns which gene positions are most informative to mask at each training step. We define a set of trainable latent vectors that attend over gene embeddings to produce attention maps, from which a multinomial sampler selects the highest-scoring positions for masking. We show MAM improves FMs pretraining performance and consistently outperforms uniform masking on cell-type classification tasks, while adding negligible computational overhead. Our work benefits researchers building FMs for sparse data and those rely on accurate scRNA-seq analysis to study cell types and disease.

APA

Naziri, A., Asgari, A., An, A., Sachlos, E. & Seyyed-Kalantari, L.. (2026). MAM: Multinomial Attention Masking for Foundation Models on Sparse Single-Cell RNA-seq Data. Proceedings of the 7th Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 333:295-311 Available from https://proceedings.mlr.press/v333/naziri26a.html.

Related Material

Download PDF