MissScore: High-Order Score Estimation in the Presence of Missing Data

Wenqin Liu, Haoze Hou, Erdun Gao, Biwei Huang, Qiuhong Ke, Howard Bondell, Mingming Gong
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:38664-38691, 2025.

Abstract

Score-based generative models are essential in various machine learning applications, with strong capabilities in generation quality. In particular, high-order derivatives (scores) of data density offer deep insights into data distributions, building on the proven effectiveness of first-order scores for modeling and generating synthetic data, unlocking new possibilities for applications. However, learning them typically requires complete data, which is often unavailable in domains such as healthcare and finance due to data corruption, acquisition constraints, or incomplete records. To tackle this challenge, we introduce MissScore, a novel framework for estimating high-order scores in the presence of missing data. We derive objective functions for estimating high-order scores under different missing data mechanisms and propose a new algorithm specifically designed to handle missing data effectively. Our empirical results demonstrate that MissScore accurately and efficiently learns the high-order scores from incomplete data and generates high-quality samples, resulting in strong performance across a range of downstream tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25aa, title = {{M}iss{S}core: High-Order Score Estimation in the Presence of Missing Data}, author = {Liu, Wenqin and Hou, Haoze and Gao, Erdun and Huang, Biwei and Ke, Qiuhong and Bondell, Howard and Gong, Mingming}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {38664--38691}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25aa/liu25aa.pdf}, url = {https://proceedings.mlr.press/v267/liu25aa.html}, abstract = {Score-based generative models are essential in various machine learning applications, with strong capabilities in generation quality. In particular, high-order derivatives (scores) of data density offer deep insights into data distributions, building on the proven effectiveness of first-order scores for modeling and generating synthetic data, unlocking new possibilities for applications. However, learning them typically requires complete data, which is often unavailable in domains such as healthcare and finance due to data corruption, acquisition constraints, or incomplete records. To tackle this challenge, we introduce MissScore, a novel framework for estimating high-order scores in the presence of missing data. We derive objective functions for estimating high-order scores under different missing data mechanisms and propose a new algorithm specifically designed to handle missing data effectively. Our empirical results demonstrate that MissScore accurately and efficiently learns the high-order scores from incomplete data and generates high-quality samples, resulting in strong performance across a range of downstream tasks.} }
Endnote
%0 Conference Paper %T MissScore: High-Order Score Estimation in the Presence of Missing Data %A Wenqin Liu %A Haoze Hou %A Erdun Gao %A Biwei Huang %A Qiuhong Ke %A Howard Bondell %A Mingming Gong %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25aa %I PMLR %P 38664--38691 %U https://proceedings.mlr.press/v267/liu25aa.html %V 267 %X Score-based generative models are essential in various machine learning applications, with strong capabilities in generation quality. In particular, high-order derivatives (scores) of data density offer deep insights into data distributions, building on the proven effectiveness of first-order scores for modeling and generating synthetic data, unlocking new possibilities for applications. However, learning them typically requires complete data, which is often unavailable in domains such as healthcare and finance due to data corruption, acquisition constraints, or incomplete records. To tackle this challenge, we introduce MissScore, a novel framework for estimating high-order scores in the presence of missing data. We derive objective functions for estimating high-order scores under different missing data mechanisms and propose a new algorithm specifically designed to handle missing data effectively. Our empirical results demonstrate that MissScore accurately and efficiently learns the high-order scores from incomplete data and generates high-quality samples, resulting in strong performance across a range of downstream tasks.
APA
Liu, W., Hou, H., Gao, E., Huang, B., Ke, Q., Bondell, H. & Gong, M.. (2025). MissScore: High-Order Score Estimation in the Presence of Missing Data. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:38664-38691 Available from https://proceedings.mlr.press/v267/liu25aa.html.

Related Material