Optimal Transfer Learning for Missing Not-at-Random Matrix Completion

Akhil Jalan, Yassir Jedra, Arya Mazumdar, Soumendu Sundar Mukherjee, Purnamrita Sarkar
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:26797-26829, 2025.

Abstract

We study transfer learning for matrix completion in a Missing Not-at-Random (MNAR) setting that is motivated by biological problems. The target matrix $Q$ has entire rows and columns missing, making estimation impossible without side information. To address this, we use a noisy and incomplete source matrix $P$, which relates to $Q$ via a feature shift in latent space. We consider both the active and passive sampling of rows and columns. We establish minimax lower bounds for entrywise estimation error in each setting. Our computationally efficient estimation framework achieves this lower bound for the active setting, which leverages the source data to query the most informative rows and columns of $Q$. This avoids the need for incoherence assumptions required for rate optimality in the passive sampling setting. We demonstrate the effectiveness of our approach through comparisons with existing algorithms on real-world biological datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-jalan25a, title = {Optimal Transfer Learning for Missing Not-at-Random Matrix Completion}, author = {Jalan, Akhil and Jedra, Yassir and Mazumdar, Arya and Mukherjee, Soumendu Sundar and Sarkar, Purnamrita}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {26797--26829}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/jalan25a/jalan25a.pdf}, url = {https://proceedings.mlr.press/v267/jalan25a.html}, abstract = {We study transfer learning for matrix completion in a Missing Not-at-Random (MNAR) setting that is motivated by biological problems. The target matrix $Q$ has entire rows and columns missing, making estimation impossible without side information. To address this, we use a noisy and incomplete source matrix $P$, which relates to $Q$ via a feature shift in latent space. We consider both the active and passive sampling of rows and columns. We establish minimax lower bounds for entrywise estimation error in each setting. Our computationally efficient estimation framework achieves this lower bound for the active setting, which leverages the source data to query the most informative rows and columns of $Q$. This avoids the need for incoherence assumptions required for rate optimality in the passive sampling setting. We demonstrate the effectiveness of our approach through comparisons with existing algorithms on real-world biological datasets.} }
Endnote
%0 Conference Paper %T Optimal Transfer Learning for Missing Not-at-Random Matrix Completion %A Akhil Jalan %A Yassir Jedra %A Arya Mazumdar %A Soumendu Sundar Mukherjee %A Purnamrita Sarkar %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-jalan25a %I PMLR %P 26797--26829 %U https://proceedings.mlr.press/v267/jalan25a.html %V 267 %X We study transfer learning for matrix completion in a Missing Not-at-Random (MNAR) setting that is motivated by biological problems. The target matrix $Q$ has entire rows and columns missing, making estimation impossible without side information. To address this, we use a noisy and incomplete source matrix $P$, which relates to $Q$ via a feature shift in latent space. We consider both the active and passive sampling of rows and columns. We establish minimax lower bounds for entrywise estimation error in each setting. Our computationally efficient estimation framework achieves this lower bound for the active setting, which leverages the source data to query the most informative rows and columns of $Q$. This avoids the need for incoherence assumptions required for rate optimality in the passive sampling setting. We demonstrate the effectiveness of our approach through comparisons with existing algorithms on real-world biological datasets.
APA
Jalan, A., Jedra, Y., Mazumdar, A., Mukherjee, S.S. & Sarkar, P.. (2025). Optimal Transfer Learning for Missing Not-at-Random Matrix Completion. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:26797-26829 Available from https://proceedings.mlr.press/v267/jalan25a.html.

Related Material