- title: 'Learning Disease Progression Models That Capture Health Disparities'
  abstract: 'Disease progression models are widely used to inform the diagnosis and treatment of many progressive diseases. However, a significant limitation of existing models is that they do not account for health disparities that can bias the observed data. To address this, we develop an interpretable Bayesian disease progression model that captures three key health disparities: certain patient populations may (1) start receiving care only when their disease is more severe, (2) experience faster disease progression even while receiving care, or (3) receive follow-up care less frequently conditional on disease severity. We show theoretically and empirically that failing to account for disparities produces biased estimates of severity (underestimating severity for disadvantaged groups, for example). On a dataset of heart failure patients, we show that our model can identify groups that face each type of health disparity, and that accounting for these disparities meaningfully shifts which patients are considered high-risk.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/chiang25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/chiang25a/chiang25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-chiang25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Erica
    family: Chiang
  - given: Divya M
    family: Shanmugam
  - given: Ashley
    family: Beecy
  - given: Gabriel
    family: Sayer
  - given: Deborah
    family: Estrin
  - given: Nikhil
    family: Garg
  - given: Emma
    family: Pierson
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 1-29
  id: chiang25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 1
  lastpage: 29
  published: 2025-07-02 00:00:00 +0000
- title: 'Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs'
  abstract: 'Recent advancements in artificial intelligence have significantly improved the automatic generation of radiology reports. However, existing evaluation methods often focus on report-to-report similarities and fail to reveal the models’ understanding of radiological images and their capacity to achieve human-level granularity in descriptions. To bridge this gap, we introduce a system, named ReXKG, which extracts structured information from processed reports to construct a comprehensive radiology knowledge graph. We then propose three metrics to evaluate the similarity of nodes, distribution of edges, and coverage of subgraphs across various knowledge graphs. Using these metrics, we conduct an in-depth comparative analysis of AI-generated and human-written radiology reports, assessing the performance of both specialist and generalist models. Our study provides a deeper understanding of the capabilities and limitations of current AI models in report generation, offering valuable insights for improving model performance and clinical applicability.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/zhang25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/zhang25a/zhang25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-zhang25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Xiaoman
    family: Zhang
  - given: Julian Nicolas
    family: Acosta
  - given: Hong-Yu
    family: Zhou
  - given: Pranav
    family: Rajpurkar
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 30-42
  id: zhang25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 30
  lastpage: 42
  published: 2025-07-02 00:00:00 +0000
- title: 'KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings'
  abstract: 'Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade-off: knowledge graph-based approaches capture formal relationships but miss real-world patterns, while data-driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge-preserving and Empirically-refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Evaluations on structured EHR from UK Biobank demonstrate that KEEP outperforms both traditional and LLM-based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP’s minimal computational requirements make it particularly suitable for resource-constrained environments.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/elhussein25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/elhussein25a/elhussein25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-elhussein25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Ahmed
    family: Elhussein
  - given: Paul
    family: Meddeb
  - given: Abigail
    family: Newbury
  - given: Jeanne
    family: Mirone
  - given: Martin
    family: Stoll
  - given: Gamze
    family: Gursoy
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 43-62
  id: elhussein25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 43
  lastpage: 62
  published: 2025-07-02 00:00:00 +0000
- title: 'Benchmarking ECG Delineation using Deep Neural Network-based Semantic Segmentation Models'
  abstract: 'Accurate electrocardiogram (ECG) delineation is essential for automated cardiac diagnosis, enabling the precise identification of key waveforms such as the P wave, QRS complex, and T wave. This study presents the first comprehensive benchmarking of neural network-based semantic segmentation models for ECG delineation, evaluating their accuracy, resource efficiency, and robustness across both public and private datasets. Our results demonstrate that convolutional neural network (CNN)-based approaches consistently achieve superior accuracy compared to other network architectures. Additionally, we observed the presence of fragmented segments in the delineation results.  To address this issue, we explored post-processing techniques to consolidate or eliminate fragmented segments using an optimal configuration, leading to performance improvements.  Furthermore, by analyzing performance variations across different waveform labels, we provide critical insights into key considerations for ECG segmentation tasks. Notably, our findings also reveal that larger model sizes do not necessarily correlate with better performance.  Based on our findings, we propose a set of practical guidelines for leveraging segmentation models in ECG delineation, offering valuable direction for future research and clinical applications.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/park25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/park25a/park25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-park25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Jaeho
    family: Park
  - given: TaeJun
    family: Park
  - given: Joon-myoung
    family: Kwon
  - given: Yong-Yeon
    family: Jo
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 63-88
  id: park25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 63
  lastpage: 88
  published: 2025-07-02 00:00:00 +0000
- title: 'Electrocardiogram–Language Model for Few-Shot Question Answering with Meta Learning'
  abstract: 'Electrocardiogram (ECG) interpretation requires specialized expertise, often involving synthesizing insights from ECG signals with complex clinical queries posed in natural language. The scarcity of labeled ECG data coupled with the diverse nature of clinical inquiries presents a significant challenge for developing robust and adaptable ECG diagnostic systems. This work introduces a novel multimodal meta-learning method for few-shot ECG question answering, addressing the challenge of limited labeled data while leveraging the rich knowledge encoded within large language models (LLMs). Our LLM-agnostic approach integrates a pre-trained ECG encoder with a frozen LLM (e.g., LLaMA and Gemma) via a trainable fusion module, enabling the language model to reason about ECG data and generate clinically meaningful answers. Extensive experiments demonstrate superior generalization to unseen diagnostic tasks compared to supervised baselines, achieving notable performance even with limited ECG leads. For instance, in a 5-way 5-shot setting, our method using LLaMA-3.1-8B achieves an accuracy of 84.6%, 77.3%, and 69.6% on single verify, choose and query question types, respectively. These results highlight the potential of our method to enhance clinical ECG interpretation by combining signal processing with the nuanced language understanding capabilities of LLMs, particularly in data-constrained scenarios.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/tang25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/tang25a/tang25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-tang25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Jialu
    family: Tang
  - given: Tong
    family: Xia
  - given: Yuan
    family: Lu
  - given: Cecilia
    family: Mascolo
  - given: Aaqib
    family: Saeed
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 89-104
  id: tang25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 89
  lastpage: 104
  published: 2025-07-02 00:00:00 +0000
- title: 'A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs'
  abstract: 'Synthetic Electronic Health Records (EHRs) offer a valuable opportunity to create privacy-preserving and harmonized structured data, supporting numerous applications in healthcare. Key benefits of synthetic data include precise control over the data schema, improved fairness and representation of patient populations, and the ability to share datasets without concerns about compromising real individuals’ privacy. Consequently, the AI community has increasingly turned to Large Language Models (LLMs) to generate synthetic data across various domains. However, a significant challenge in healthcare is ensuring that synthetic health records reliably generalize across different hospitals, a long-standing issue in the field. In this work, we evaluate the current state of commercial LLMs for generating synthetic data and investigate multiple aspects of the generation process to identify areas where these models excel and where they fall short. Our main finding from this work is that while LLMs can reliably generate synthetic health records for smaller subsets of features, they struggle to preserve realistic distributions and correlations as the dimensionality of the data increases, ultimately limiting their ability to generalize across diverse hospital settings.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/lin25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/lin25a/lin25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-lin25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Yihan
    family: Lin
  - given: Zhirong
    family: Yu
  - given: Simon A.
    family: Lee
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 105-129
  id: lin25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 105
  lastpage: 129
  published: 2025-07-02 00:00:00 +0000
- title: 'Multiaccuracy for Subpopulation Calibration Over Distribution Shift in Medical Prediction Models'
  abstract: 'Multiaccuracy was previously demonstrated to improve subpopulation calibration in medical prediction models, ensuring fairness towards subpopulations. Medical prediction models often experience degraded performance due to distribution shifts (e.g. changes in input data resulting from changes in space or time), but the effectiveness of multiaccuracy in ensuring medical predictors’ fairness under these circumstances was suggested theoretically but has yet to be studied empirically. To explore this, we trained prediction models using real-world data, applied an adaptation of multiaccuracy as a post-processing step to intersecting subpopulations defined by combinations of protected features such as age, gender, and socioeconomic status, and tested the performance of the models on target test sets from distributions different than the development cohorts. The results demonstrated that the improvement in subpopulation calibration achieved by multiaccuracy was maintained in the target distribution over two experiments, simulating spatial-temporal and migration-induced distribution shifts. On average, over the two experiments, Calibration in the Large mean error and variance measures were reduced by 71.8% and 70.7% on the target distributions after applying multiaccuracy, respectively. These findings highlight the potential of post-processing for multiaccuracy as a tool for enhancing the fairness and reliability of medical prediction models across diverse populations, even under circumstances of major distribution shifts.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/kapash25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/kapash25a/kapash25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-kapash25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: Kapash
  - given: Noam
    family: Barda
  - given: Omer
    family: Reingold
  - given: Noa
    family: Dagan
  - given: Ran
    family: Balicer
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 130-144
  id: kapash25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 130
  lastpage: 144
  published: 2025-07-02 00:00:00 +0000
- title: 'WatchSleepNet: A Novel Model and Pretraining Approach for Advancing Sleep Staging with Smartwatches'
  abstract: 'Sleep monitoring is essential for assessing overall health and managing sleep disorders, yet clinical adoption of consumer wearables remains limited due to inconsistent performance and scarce open source datasets and transparent codebase. In this study, we introduce WatchSleepNet, a novel, open-source three-stage sleep staging algorithm. The model uses sequence-to-sequence architecture integrating Residual Networks (ResNet), Temporal Convolutional Networks (TCN), and Long Short-Term Memory (LSTM) networks with self-attention to effectively capture both spatial and temporal dependencies crucial for sleep staging. To address the limited availability of high-quality wearable photoplethysmography (PPG) datasets, WatchSleepNet leveraged inter-beat interval (IBI) signals as a shared representation across polysomnography (PSG) and photoplethysmography (PPG) modalities. By pretraining on large PSG datasets and fine-tuning on wrist-worn PPG signals, the model achieved a REM F1 score of 0.642 +/- 0.072 and a Cohen’s Kappa of 0.684 +/- 0.051, surpassing previous state-of-the-art methods. To promote transparency and further research, we publicly release our model and codebase, advancing reproducibility and accessibility in wearable sleep research and enabling the development for more robust, clinically viable sleep monitoring solutions.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/wang25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/wang25a/wang25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-wang25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Will Ke
    family: Wang
  - given: Bill
    family: Chen
  - given: Jiamu
    family: Yang
  - given: Hayoung
    family: Jeong
  - given: Leeor
    family: Hershkovich
  - given: Shekh Md Mahmudul
    family: Islam
  - given: Mengde
    family: Liu
  - given: Ali R
    family: Roghanizad
  - given: Md Mobashir Hasan
    family: Shandhi
  - given: Andrew R
    family: Spector
  - given: Jessilyn
    family: Dunn
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 145-165
  id: wang25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 145
  lastpage: 165
  published: 2025-07-02 00:00:00 +0000
- title: 'Contrastive Pretraining for Stress Detection with Multimodal Wearable Sensor Data and Surveys'
  abstract: 'Stress adversely affects mental and physical health and underscores the importance of early detection. Some studies have utilized physiological signals from wearable sensors and other information to monitor stress levels in daily life. Recent studies use self-supervised methods due to the high cost of collecting stress labels. However, self-supervised learning using both time series and tabular features such as demographics, traits, and contextual information has been understudied. Therefore, there is a need to further investigate how a model can be effectively trained with different granularity of multimodal data and limited number of labels. In this study, we introduce a self-supervised multimodal learning approach for stress detection that combines time series and tabular features. Our proposed method presents a promising solution for effectively monitoring stress using multimodal data.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/yang25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/yang25a/yang25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-yang25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Zeyu
    family: Yang
  - given: Han
    family: Yu
  - given: Akane
    family: Sano
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 166-178
  id: yang25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 166
  lastpage: 178
  published: 2025-07-02 00:00:00 +0000
- title: 'Causal considerations can deterimine the utility of machine learning assisted GWAS'
  abstract: 'Machine Learning (ML) is increasingly employed to generate health related traits (phenotypes) for genetic discovery, either by imputing existing phenotypes into larger cohorts or by creating novel phenotypes. While these ML-derived phenotypes can significantly increase sample size, and thereby empower genetic discovery, they can also inflate the false discovery rate (FDR). Recent research has focused on developing estimators that leverage both true and machine-learned phenotypes to properly control false positives. Our work complements these efforts by exploring how the true positive rate (TPR) and FDR depend on the causal relationships among the inputs to the ML model, the true phenotypes, and the environment.  Using a simulation-based framework, we study causal architectures in which the machine-learned proxy phenotype is derived from biomarkers (i.e. ML model input features) either causally upstream or downstream of the target phenotype (ML model output). We show that no inflation of the false discovery rate occurs when the proxy phenotype is generated from upstream biomarkers, but that false discoveries can occur when the proxy phenotype is generated from downstream biomarkers. Next, we show that power to detect genetic variants truly associated with the target trait depends on its genetic component and correlation with the proxy trait. However, the source of the correlation is key to evaluating a proxy phenotype’s utility for genetic discovery. We demonstrate that evaluating machine-learned proxy phenotypes using out-of-sample predictive performance (e.g. test $R^2$) provides a poor lens on utility. This is because overall predictive performance does not differentiate between genetic and environmental components. In addition to parsing these properties of machine-learned phenotypes via simulations, we further illustrate them using real-world data from the UK Biobank.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/mukherjee25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/mukherjee25a/mukherjee25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-mukherjee25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Sumit
    family: Mukherjee
  - given: ZACHARY R
    family: MCCAW
  - given: David
    family: Amar
  - given: Rounak
    family: Dey
  - given: Thomas W
    family: Soare
  - given: Hari
    family: Somineni
  - given: Nicholas
    family: Eriksson
  - given: Colm
    family: O’Dushlaine
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 179-193
  id: mukherjee25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 179
  lastpage: 193
  published: 2025-07-02 00:00:00 +0000
- title: 'Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-compliance'
  abstract: 'Estimates of heterogeneous treatment assignment effects are valuable when making treatment decisions. Under the presence of non-compliance (e.g., patients do not adhere to their assigned treatment), the standard backdoor adjustment (SBD) and the conditional frond-door adjustment (CFD) can both recover unbiased estimates of the treatment assignment effects. Therefore, which is more suitable depends on their estimation variance. From existing literature, it is unclear which of the two produces lower-variance estimates. In this work, we demonstrate theoretically and empirically that CFD yields lower-variance estimates than SBD when the true effect of treatment assignment is small. Additionally, since CFD requires estimating multiple nuisance parameters, we introduce LobsterNet, a multi-task neural network that implements CFD with joint modeling. Empirically, LobsterNet reduces estimation error across several semi-synthetic and real-world datasets compared to baselines. Our findings suggest CFD with shared nuisance parameter modeling can improve treatment assignment effect estimation under non-compliance.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/chen25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/chen25a/chen25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-chen25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Winston
    family: Chen
  - given: Trenton
    family: Chang
  - given: Jenna
    family: Wiens
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 194-230
  id: chen25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 194
  lastpage: 230
  published: 2025-07-02 00:00:00 +0000
- title: 'CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning'
  abstract: 'Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves $86.2%$ accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of $56.9%$ on unseen datasets. These findings highlight the transformative potential of integrating audio analysis with language-based reasoning to address key challenges in medical diagnostics, opening new possibilities for scalable, data-efficient AI systems capable of supporting real-world clinical decision-making.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/wang25b.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/wang25b/wang25b.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-wang25b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Tsai-Ning
    family: Wang
  - given: Lin-Lin
    family: Chen
  - given: Neil
    family: Zeghidour
  - given: Aaqib
    family: Saeed
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 231-246
  id: wang25b
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 231
  lastpage: 246
  published: 2025-07-02 00:00:00 +0000
- title: 'Towards Predicting Temporal Changes in a Patient’s Chest X-ray Images based on Electronic Health Records'
  abstract: 'Chest X-ray (CXR) is an important diagnostic tool widely used in hospitals to assess patient conditions and monitor changes over time. Recently, generative models, specifically diffusion-based models, have shown promise in generating realistic synthetic CXRs. However, these models mainly focus on conditional generation using single-time-point data, i.e., generating CXRs conditioned on their corresponding reports from a specific time. This limits their clinical utility, particularly for capturing temporal changes. To address this limitation, we propose a novel framework, EHRXDiff, which predicts future CXR images by integrating previous CXRs with subsequent medical events, e.g., prescriptions, lab measures, etc. Our framework dynamically tracks and predicts disease progression based on a latent diffusion model, conditioned on the previous CXR image and a history of medical events. We comprehensively evaluate the performance of our framework across three key aspects, including clinical consistency, demographic consistency, and visual realism. Results show that our framework generates high-quality, realistic future images that effectively capture potential temporal changes. This suggests that our framework could be further developed to support clinical decision-making and provide valuable insights for patient monitoring and treatment planning in the medical field.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/kyung25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/kyung25a/kyung25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-kyung25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Daeun
    family: Kyung
  - given: Junu
    family: Kim
  - given: Tackeun
    family: Kim
  - given: Edward
    family: Choi
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 247-267
  id: kyung25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 247
  lastpage: 267
  published: 2025-07-02 00:00:00 +0000
- title: 'Beyond Prompting: Time2Lang - Bridging Time-Series Foundation Models and Large Language Models for Health Sensing'
  abstract: 'Large language models (LLMs) show promise for health applications when combined with behavioral sensing data. Traditional approaches convert sensor data into text prompts, but this process is prone to errors, computationally expensive, and requires domain expertise. These challenges are particularly acute when processing extended time series data. While time series foundation models (TFMs) have recently emerged as powerful tools for learning representations from temporal data, bridging TFMs and LLMs remains challenging. Here, we present Time2Lang, a framework that directly maps TFM outputs to LLM representations without intermediate text conversion. Our approach first trains on synthetic data using periodicity prediction as a pretext task, followed by evaluation on mental health classification tasks. We validate Time2Lang on two longitudinal wearable and mobile sensing datasets: daily depression prediction using step count data (17,251 days from 256 participants) and flourishing classification based on conversation duration (46 participants over 10 weeks). Time2Lang maintains consistent inference times regardless of input length, unlike traditional prompting methods. The generated embeddings preserve essential time-series characteristics such as auto-correlation. Our results demonstrate that TFMs and LLMs can be effectively integrated while minimizing information loss and enabling performance transfer across these distinct modeling paradigms. This work establishes a foundation for future research combining general-purpose large models for complex healthcare tasks.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/pillai25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/pillai25a/pillai25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-pillai25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Arvind
    family: Pillai
  - given: Dimitris
    family: Spathis
  - given: Subigya
    family: Nepal
  - given: Amanda C.
    family: Collins
  - given: Daniel M
    family: Mackin
  - given: Michael V.
    family: Heinz
  - given: Tess Z
    family: Griffin
  - given: Nicholas C.
    family: Jacobson
  - given: Andrew
    family: Campbell
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 268-288
  id: pillai25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 268
  lastpage: 288
  published: 2025-07-02 00:00:00 +0000
- title: 'When Attention Fails: Pitfalls of Attention-based Model Interpretability for High-dimensional Clinical Time-Series'
  abstract: 'Attention-based deep learning models are widely used for clinical time-series analysis, largely due to their perceived ability to enhance model interpretability. However, the reliability, faithfulness, and consistency of attention mechanisms as an interpretability tool in high-dimensional clinical time series data require further investigation. We conducted a comprehensive evaluation of consistency and faithfulness of attention mechanisms in deep learning models applied to high-dimensional clinical time-series data. Specifically, we trained 1000 different variants of an attention-based LSTM model architecture with random initializations to analyze the consistency of attention scores across mortality prediction and patient severity group classification. Our findings revealed significant inconsistencies in attention scores for individual samples across the thousand model variants. Visual inspection of attention weight distributions indicated that the attention mechanism did not consistently focus on the same feature-time pairs, challenging the assumption of faithfulness and reliability in model interpretability. The observed inconsistencies in per-sample attention weights suggest that attention mechanisms are unreliable as an interpretability tool for clinical decision-making tasks involving high-dimensional time-series data. While attention mechanisms may enhance model performance metrics, they often fail to produce clinically meaningful and consistent interpretations, limiting their utility in healthcare settings where transparency is critical for informed decision-making.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/yadav25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/yadav25a/yadav25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-yadav25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Shashank
    family: Yadav
  - given: Vignesh
    family: Subbian
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 289-305
  id: yadav25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 289
  lastpage: 305
  published: 2025-07-02 00:00:00 +0000
- title: 'Global Deep Forecasting with Patient-Specific Pharmacokinetics'
  abstract: 'Forecasting healthcare time series data is vital for early detection of adverse outcomes and patient monitoring. However, it can be challenging in practice due to variable medication administration and unique pharmacokinetic (PK) properties of each patient. To address these challenges, we propose a novel hybrid global-local architecture and a PK encoder that informs deep learning models of patient-specific treatment effects. We showcase the efficacy of our approach in achieving significant accuracy gains in a blood glucose forecasting task using both realistically simulated and real-world data. Our PK encoder surpasses baselines by up to 16.4% on simulated data and 5.3% on real-world data for individual patients during critical events of severely high and low glucose levels. Furthermore, our proposed hybrid global-local architecture outperforms patient-specific PK models by 15.8%, on average.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/potosnak25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/potosnak25a/potosnak25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-potosnak25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Willa
    family: Potosnak
  - given: Cristian Ignacio
    family: Challu
  - given: Kin G.
    family: Olivares
  - given: Keith A
    family: Dufendach
  - given: Artur
    family: Dubrawski
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 306-336
  id: potosnak25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 306
  lastpage: 336
  published: 2025-07-02 00:00:00 +0000
- title: 'ExOSITO: Explainable Off-Policy Learning with Side Information for Intensive Care Unit Blood Test Orders'
  abstract: 'Ordering a minimal subset of lab tests for patients in the intensive care unit (ICU) can be challenging. Care teams must balance between ensuring the availability of the right information and reducing the clinical burden and costs associated with each lab test order. Most in-patient settings experience frequent over-ordering of lab tests, but are now aiming to reduce this burden on both hospital resources and the environment. This paper develops a novel method that combines off-policy learning with privileged information to identify the optimal set of ICU lab tests to order. Our approach, EXplainable Off-policy learning with Side Information for ICU blood Test Orders (ExOSITO) creates an interpretable assistive tool for clinicians to order lab tests by considering both the observed and predicted future status of each patient.  We pose this problem as a causal bandit trained using offline data and a novel reward function derived from clinically-approved rules; we introduce a novel learning framework that integrates clinical knowledge with observational data to bridge the gap between the optimal and logging policies.  The learned policy function provides interpretable clinical information and reduces costs without omitting any vital lab orders, outperforming both a physician’s policy and prior approaches to this practical problem.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/ji25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/ji25a/ji25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-ji25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Zongliang
    family: Ji
  - given: Andre Carlos Kajdacsy-Balla
    family: Amaral
  - given: Anna
    family: Goldenberg
  - given: Rahul G
    family: Krishnan
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 337-368
  id: ji25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 337
  lastpage: 368
  published: 2025-07-02 00:00:00 +0000
- title: 'Distributionally Robust Learning in Survival Analysis'
  abstract: 'We introduce an innovative approach that incorporates a $\textit{Distributionally Robust Learning (DRL)}$ approach into Cox regression to enhance the robustness and accuracy of survival predictions. By formulating a DRL framework with a Wasserstein distance-based ambiguity set, we develop a variant Cox model that is less sensitive to assumptions about the underlying data distribution and more resilient to model misspecification and data perturbations. By leveraging Wasserstein duality, we reformulate the original min-max DRL problem into a tractable regularized empirical risk minimization problem, which can be computed by exponential conic programming. We provide guarantees on the finite sample behavior of our DRL-Cox model. Moreover, through extensive simulations and real world case studies, we demonstrate that our regression model achieves superior performance in terms of prediction accuracy and robustness compared with traditional methods.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/jin25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/jin25a/jin25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-jin25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Yeping
    family: Jin
  - given: Lauren
    family: Wise
  - given: Ioannis
    family: Paschalidis
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 369-380
  id: jin25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 369
  lastpage: 380
  published: 2025-07-02 00:00:00 +0000
- title: 'Test-Time Calibration: A Framework for Personalized Test-Time Adaptation in Real-World Biosignals'
  abstract: 'Test-Time Adaptation (TTA) methods have been widely used to enhance model robustness by continuously updating pre-trained models with unlabeled target data.  However, in real-world biosignal applications-where factors such as age, lifestyle, and comorbidities induce significant variability–traditional TTA often falls short in addressing personalization needs.  To satisfy such needs, we introduce a novel Test-Time Calibration (TTC) framework that integrates continuous self-supervised adaptation on unlabeled samples with periodic supervised calibration using the sporadically available ground-truth labels.  Our approach leverages a model equipped with dual heads for supervised learning (SL) and self-supervised learning (SSL), and further incorporates a dual buffer along with a weighted batch sampling strategy to effectively manage and utilize both data types during the test phase.  We evaluate our framework on two distinct datasets: the publicly available PulseDB, a benchmark for cuff-less blood pressure estimation, and a private ICU dataset collected from critically ill patients.  Experimental results demonstrate that our approach improves blood pressure prediction accuracy and robustness, highlighting its suitability for dynamic, personalized biosignal applications.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/jo25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/jo25a/jo25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-jo25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Yong-Yeon
    family: Jo
  - given: Byeong Tak
    family: Lee
  - given: Jeong-Ho
    family: Hong
  - given: Hak Seung
    family: Lee
  - given: Joon-myoung
    family: Kwon
  - given: Beom Joon
    family: Kim
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 381-394
  id: jo25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 381
  lastpage: 394
  published: 2025-07-02 00:00:00 +0000
- title: 'ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice'
  abstract: 'Detecting arousals during sleep is crucial for diagnosing sleep disorders, yet the adoption of Machine Learning (ML) in clinical practice is hindered by a mismatch between clinical protocols and ML methods. Clinicians typically annotate only arousal onsets, whereas ML approaches conventionally rely on annotations for both the beginning and end. Moreover, no standardized evaluation methodology exists that is tailored to the specific needs of arousal detection in clinical practice. We address these challenges by proposing a novel post-processing and evaluation framework - Approximate Localization and Precise Event Count (ALPEC) - which optimizes arousal detectors to reflect operational priorities. We further advocate focusing on arousal onset detection and assess the impact of this on current training and evaluation schemes, addressing associated simplifications and challenges. Finally, we introduce a novel polysomnographic dataset that reflects the aforementioned clinical annotation constraints and includes modalities absent from existing datasets, demonstrating the benefits of leveraging multimodal data for arousal onset detection. Our contributions significantly advance the integration of ML-based arousal detection into clinical settings, narrowing the gap between technological advancements and clinical requirements.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/kraft25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/kraft25a/kraft25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-kraft25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Stefan
    family: Kraft
  - given: Andreas
    family: Theissler
  - given: Dr. Vera
    family: Wienhausen-Wilke
  - given: Philipp
    family: Walter
  - given: Gjergji
    family: Kasneci
  - given: Hendrik
    family: Lensch
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 395-429
  id: kraft25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 395
  lastpage: 429
  published: 2025-07-02 00:00:00 +0000
- title: 'Treatment Non-Adherence Bias in Clinical Machine Learning: A Real-World Study on Hypertension Medication'
  abstract: 'Machine learning systems trained on electronic health records (EHRs) increasingly guide treatment decisions, but their reliability depends on the critical assumption that patients follow the prescribed treatments recorded in EHRs. Using EHR data from 3,623 hypertension patients, we investigate how treatment non-adherence introduces implicit bias that can fundamentally distort both causal inference and predictive modeling. By extracting patient adherence information from clinical notes using a large language model, we identify 786 patients (21.7%) with medication non-adherence. We further uncover key demographic and clinical factors associated with non-adherence, as well as patient-reported reasons including side effects and difficulties obtaining refills. Our findings demonstrate that this implicit bias can not only reverse estimated treatment effects, but also degrade model performance by up to 5% while disproportionately affecting vulnerable populations by exacerbating disparities in decision outcomes and model error rates. This highlights the importance of accounting for treatment non-adherence in developing responsible and equitable clinical machine learning systems.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/liang25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/liang25a/liang25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-liang25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhongyuan
    family: Liang
  - given: Arvind
    family: Suresh
  - given: Irene Y.
    family: Chen
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 430-442
  id: liang25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 430
  lastpage: 442
  published: 2025-07-02 00:00:00 +0000
- title: 'Predicting Health States of Patients with Chronic Pain from Cellphone Usage Data'
  abstract: 'This study followed patients suffering from chronic pain and aimed to predict their health states. To this end, we conducted a clinical study in which patients were digitally monitored via clinically validated questionnaires (SF-36 and EQ-5D) and continuously collected cellphone usage data. We present a novel two-step approach for utilizing the immense amounts of unlabeled cellular logs in a supervised, binary classification problem and predicting patient-reported outcomes from objective cellphone usage data. Reaching an accuracy of 0.827 for women and 0.898 for men, our classification results show the feasibility of using cellphone monitoring data for patients’ state prediction. Such a capability may enrich periodic clinical assessments with frequent digital follow-ups, assist in disease management for chronic patients, and raise awareness whenever necessary.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/stemmer25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/stemmer25a/stemmer25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-stemmer25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Maya
    family: Stemmer
  - given: Lior
    family: Ungar
  - given: Talia
    family: Friedman
  - given: Lihi
    family: Bik
  - given: Yotam
    family: Hadari
  - given: Itamar
    family: Efrati
  - given: Yarden
    family: Rachamim
  - given: Lior
    family: Carmi
  - given: Shai
    family: Fine
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 443-457
  id: stemmer25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 443
  lastpage: 457
  published: 2025-07-02 00:00:00 +0000
- title: 'Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?'
  abstract: 'Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin’s impact on LLM outputs.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/yun25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/yun25a/yun25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-yun25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Hye Sun
    family: Yun
  - given: Karen Y.C.
    family: Zhang
  - given: Ramez
    family: Kouzy
  - given: Iain James
    family: Marshall
  - given: Junyi Jessy
    family: Li
  - given: Byron C
    family: Wallace
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 458-479
  id: yun25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 458
  lastpage: 479
  published: 2025-07-02 00:00:00 +0000
- title: 'Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases'
  abstract: 'Missing data is pervasive in healthcare. Many imputation methods exist to fill in missing values, yet most were evaluated using randomly deleted values rather than the actual mechanisms they were designed to address. We aimed to determine real-world accuracy on all types of missing data (missing completely at random, MCAR; missing at random, MAR; and not missing at random, NMAR) for state of the art and commonly used imputation methods. Using two time series data targets (continuous glucose monitoring, Loop dataset; heart rate, All of Us dataset) we simulated missingness for each mechanism, at a range of missingness percentages (5-30%) and tested 12 imputation methods. We evaluated accuracy with multiple metrics including root mean square error (RMSE) and bias. We found that overall, accuracy was significantly better on MCAR than on MAR and NMAR, despite many methods being developed for those mechanisms. Linear interpolation had the lowest RMSE with all mechanisms and for all demographic groups, with low bias. This study shows that current evaluation practices do not provide an accurate picture of real-world performance with realistic patterns of missingness. Future research is needed to develop evaluation practices that better capture real-world accuracy, and methods that better address real-world mechanisms.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/toye25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/toye25a/toye25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-toye25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Adedolapo Aishat
    family: Toye
  - given: Asuman
    family: Celik
  - given: Samantha
    family: Kleinberg
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 480-501
  id: toye25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 480
  lastpage: 501
  published: 2025-07-02 00:00:00 +0000
- title: 'Multi-View Contrastive Learning for Robust Domain Adaptation in Medical Time Series Analysis'
  abstract: 'Adapting machine learning models to medical time series across different domains remains a challenge due to complex temporal dependencies and dynamic distribution shifts. Current approaches often focus on isolated feature representations, limiting their ability to fully capture the intricate temporal dynamics necessary for robust domain adaptation. In this work, we propose a novel framework leveraging multi-view contrastive learning to integrate temporal patterns, derivative-based dynamics, and frequency-domain features. Our method employs independent encoders and a hierarchical fusion mechanism to learn feature-invariant representations that are transferable across domains while preserving temporal coherence. Extensive experiments on diverse medical datasets, including electroencephalogram (EEG), electrocardiogram (ECG), and electromyography (EMG), demonstrate that our approach significantly outperforms state-of-the-art methods in transfer learning tasks. By advancing the robustness and generalizability of machine learning models, our framework offers a practical pathway for deploying reliable AI systems in diverse healthcare settings.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/oh25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/oh25a/oh25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-oh25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: YongKyung
    family: Oh
  - given: Alex
    family: Bui
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 502-526
  id: oh25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 502
  lastpage: 526
  published: 2025-07-02 00:00:00 +0000
- title: 'CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports'
  abstract: 'Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable dense information extraction from case reports but are rarely evaluated for this task. We introduce CaseReportBench, an expert-crafted dataset for dense information extraction of case reports (focusing on IEMs). Using this dataset, we assess various models and promptings, introducing novel strategies of category-specific prompting and \textbf{subheading-filtered data integration}. Zero-shot chain-of-thought offers little advantage over zero-shot prompting. Category-specific prompting improves alignment to benchmark. Open-source Qwen2.5:7B outperforms GPT-4o for this task. Our clinician evaluations show that LLMs can extract clinically relevant details from case reports, supporting rare disease diagnosis and management, while highlighting areas for improvement, such as LLM’s limitation in recognizing negative findings for differential diagnosis. This work advances LLM-driven clinical NLP, paving the way for scalable, privacy-conscious medical AI applications.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/zhang25b.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/zhang25b/zhang25b.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-zhang25b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Xiao Yu Cindy
    family: Zhang
  - given: Carlos R.
    family: Ferreira
  - given: Francis
    family: Rossignol
  - given: Raymond T.
    family: Ng
  - given: Wyeth
    family: Wasserman
  - given: Jian
    family: Zhu
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 527-542
  id: zhang25b
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 527
  lastpage: 542
  published: 2025-07-02 00:00:00 +0000
- title: 'Feasibility of Immersive Virtual Reality and Customized Robotics with Wearable Sensors for Upper Extremity Training'
  abstract: 'Upper limb impairment significantly impacts daily activities and quality of life. Traditional robotic systems have been widely used in neurological rehabilitation applications. However, its adoption has been limited to laboratory and clinical settings due to cost constraints. Our study aimed to assess the feasibility and usability of a cost-effective virtual reality (VR) for home-based upper limb training. We used a customized wearable sleeve sensor to assess the hand and elbow joint movements objectively.   A pilot user study (n = 16) with healthy participants involved evaluating system usability, task load, and presence within two conditions of VR alone and VR combined with a customized inverse kinematics robot arm (KinArm).  Results of statistical analysis using a two-way repeated measure (ANOVA) revealed no significant difference between conditions in task completion time. However, significant differences were observed in the normalized number of mistakes and recorded elbow joint angles  between tasks.  Our findings highlight the potential advantages of an immersive and multi-sensory approach towards performance assessment.  This study explores avenues for the development of potentially cost-effective, tailored, and engaging environments for home-based therapy applications.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/kiafar25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/kiafar25a/kiafar25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-kiafar25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Behdokht
    family: Kiafar
  - given: Pinar
    family: Kullu
  - given: Rakshith
    family: Lokesh
  - given: Amit
    family: Chaudhari
  - given: Qile
    family: Wang
  - given: Shayla
    family: Sharmin
  - given: Sagar M.
    family: Doshi
  - given: Elham
    family: Bakhshipour
  - given: Erik
    family: Thostenson
  - given: Joshua
    family: Cashaback
  - given: Roghayeh Leila
    family: Barmaki
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 543-556
  id: kiafar25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 543
  lastpage: 556
  published: 2025-07-02 00:00:00 +0000
- title: 'Bridging the utility gap between MALDI-TOF and WGS for affordable outbreak cluster detection'
  abstract: 'Rapid and accurate detection of emerging outbreak clusters can help contain the spread of diseases with epidemic potential. Among the available pathogen matching methods that can be used to support the task, whole genome sequencing (WGS) offers the highest discriminatory power but is expensive and time-consuming. On the other hand,  Matrix-Assisted Laser Desorption Ionization–Time of Flight (MALDI-TOF) mass spectrometry is gaining attention for being a rapid and cost-effective, albeit less precise, alternative. In order to combine the strengths of both MALDI-TOF and WGS, we present MSMAP, the first machine learning framework that establishes a mapping between MALDI-TOF mass spectra and the single nucleotide polymorphism (SNP) distances obtained from WGS analysis. We demonstrate the effectiveness of MSMAP in retrieving WGS-defined outbreak clusters on synthetic mass spectrum data and on proprietary data with paired MALDI-TOF and SNP information. The results show that MSMAP augments MALDI-TOF with the discriminatory power of WGS, thus bridging their utility gap and paving the way toward fast, accurate and cost-effective outbreak cluster detection.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/liu25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/liu25a/liu25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-liu25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Chang
    family: Liu
  - given: Jieshi
    family: Chen
  - given: Lee H
    family: Harrison
  - given: Artur
    family: Dubrawski
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 557-572
  id: liu25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 557
  lastpage: 572
  published: 2025-07-02 00:00:00 +0000
- title: 'The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis'
  abstract: 'This study aims to quantify the association between non-adherence to antipsychotic medications and adverse outcomes among individuals with schizophrenia. We frame this problem in the context of survival analysis, looking at the time until the earliest of several types of adverse outcomes (early death, involuntary hospitalization, jail booking)–we refer to this time duration as the adverse event time. We apply standard causal inference tools (T-learner, S-learner, and nearest neighbor matching) with various survival models to estimate individual and average treatment effects in terms of differences in mean adverse event times, where the treatment corresponds to medication non-adherence. We repeat our analysis using different amounts of longitudinal information available per individual (3, 6, 9, and 12 months). Using real data from a county’s administrative records, our results show strong evidence that medication non-adherence is associated with earlier adverse outcomes, advancing the onset of an adverse event by approximately 1 to 4 months. Ablation studies confirm that risk scores provided by the county account for key confounders, as their removal amplifies the estimated effects of non-adherence. Finally, subgroup analyses by medication formulation (injectable vs. oral) and by specific medication type consistently show that non-adherence is associated with earlier adverse outcomes. These findings underscore the clinical importance of medication adherence in delaying severe psychiatric crises and show that integrating survival analysis with causal inference tools can yield policy-relevant insights in complex healthcare settings. We caution that although we use causal inference tools, we only make associative claims; we discuss the validity of some assumptions that would enable us to rigorously convert our claims into causal ones.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/noroozizadeh25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/noroozizadeh25a/noroozizadeh25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-noroozizadeh25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Shahriar
    family: Noroozizadeh
  - given: Pim
    family: Welle
  - given: Jeremy
    family: Weiss
  - given: George H.
    family: Chen
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 573-609
  id: noroozizadeh25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 573
  lastpage: 609
  published: 2025-07-02 00:00:00 +0000
- title: 'Investigating Primary Care Indications to Improve the Quality of Electronic Health Record Data in Target Trial Emulation for Dementia'
  abstract: 'Missing data, inaccuracies in medication lists, and recording delays in electronic health records (EHR) are major limitations for target trial emulation (TTE), which uses EHR data to retrospectively emulate a clinical trial. EHR-based TTE relies on recorded data that proxy actual drug exposures and outcomes. While prior work has proposed various methods to improve EHR data quality, here we investigate the under-utilized consideration that encounters with a primary care provider (PCP) may result in more accurate data in the EHR. Patients with a PCP within the EHR network being studied tend to have more encounters overall and a greater proportion of the types of encounters that yield comprehensive and up-to-date records. By contrasting data for patients with and without a PCP in the considered EHR network, we demonstrate how PCP status affects EHR data quality. Through a case study, we then empirically examine the impact on TTE of including a PCP status feature either in the propensity score and outcome models or as an eligibility criterion for cohort selection, versus ignoring it. Specifically, we compare the estimated effects of two first-line antidiabetic drug classes on the onset of Alzheimer’s Disease and Related Dementias. We find that the estimated treatment effect is sensitive to the consideration of PCP status, particularly when used as an eligibility criterion. Our work suggests that further researching the role of PCP status may improve the design of pragmatic trials.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/sunog25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/sunog25a/sunog25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-sunog25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Max I
    family: Sunog
  - given: Colin
    family: Magdamo
  - given: Marie-Laure
    family: Charpignon
  - given: Mark W.
    family: Albers
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 610-648
  id: sunog25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 610
  lastpage: 648
  published: 2025-07-02 00:00:00 +0000
- title: 'HeadCT-ONE: Enabling Granular and Controllable Automated Evaluation of Head CT Radiology Report Generation'
  abstract: 'We present Head CT Ontology Normalized Evaluation (HeadCT-ONE), a metric for evaluating head CT report generation through ontology-normalized entity and relation extraction. HeadCT-ONE enhances current information extraction derived metrics (such as RadGraph F1) by implementing entity normalization through domain-specific ontologies, addressing radiological language variability. HeadCT-ONE compares normalized entities and relations, allowing for controllable weighting of different entity types or specific entities. Through experiments on head CT reports from three health systems, we show that HeadCT-ONE’s normalization and weighting approach improves the capture of semantically equivalent reports, better distinguishes between normal and abnormal reports, and aligns with radiologists’ assessment of clinically significant errors, while offering flexibility to prioritize specific aspects of report content. Our results demonstrate how HeadCT-ONE enables more flexible, controllable, and granular automated evaluation of head CT reports.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/acosta25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/acosta25a/acosta25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-acosta25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Julian Nicolas
    family: Acosta
  - given: Xiaoman
    family: Zhang
  - given: Siddhant
    family: Dogra
  - given: Hong-Yu
    family: Zhou
  - given: Seyedmehdi
    family: Payabvash
  - given: Guido J.
    family: Falcone
  - given: Eric Karl
    family: Oermann
  - given: Pranav
    family: Rajpurkar
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 649-671
  id: acosta25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 649
  lastpage: 671
  published: 2025-07-02 00:00:00 +0000
- title: 'Predicting Partially Observed Long-Term Outcomes with Adversarial Positive-Unlabeled Domain Adaptation'
  abstract: 'Predicting long-term clinical outcomes often requires large-scale training data with sufficiently long follow-up. However, in electronic health records (EHR) data, long-term labels may not be available for contemporary patient cohorts. Given the dynamic nature of clinical practice, models that rely on historical training data may not perform optimally. In this work, we frame the problem as a positive–unlabeled domain adaptation task, where we seek to adapt from a fully labeled source domain (e.g., historical data) to a partially labeled target domain (e.g., contemporary data). We propose an adversarial framework that includes three core components: (1) Overall Alignment, to match feature distributions between source and target domains; (2) Partial Alignment, to map source negatives to unlabeled target samples; and (3) Conditional Alignment, to address conditional shift using available positive labels in the target domain. We evaluate our method on a benchmark digit classification task (SVHN-MNIST), and two real-world EHR applications: prediction of one-year mortality post COVID-19, and long-term prediction of neurodevelopmental conditions (NDC) in children. In all settings, our approach consistently outperforms baseline models and, in most cases, achieves performance close to an oracle model trained with fully observed labels.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/yan25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/yan25a/yan25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-yan25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Mengying
    family: Yan
  - given: Meng
    family: Xia
  - given: Wei Angel
    family: Huang
  - given: Chuan
    family: Hong
  - given: Benjamin
    family: Goldstein
  - given: Matthew M.
    family: Engelhard
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 672-690
  id: yan25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 672
  lastpage: 690
  published: 2025-07-02 00:00:00 +0000
- title: 'Learning Interactions Between Continuous Treatments and Covariates with a Semiparametric Model'
  abstract: 'Estimating the impact of continuous treatment variables (e.g., dosage amount) on binary outcomes presents significant challenges in modeling and estimation because many existing approaches make strong assumptions that do not hold for certain continuous treatment variables. For instance, traditional logistic regression makes strong linearity assumptions that do not hold for continuous treatment variables like time of initiation. In this work, we propose a semiparametric regression framework that decomposes effects into two interpretable components: a prognostic score that captures baseline outcome risk based on a combination of clinical, genetic, and sociodemographic features, and a treatment-interaction score that flexibly models the optimal treatment level via a nonparametric link function. By connecting these two parametric scores with Nadaraya–Watson regression, our approach is both interpretable and flexible. The potential of our approach is demonstrated through numerical simulations that show empirical estimation convergence. We conclude by applying our approach to a real-world case study using the International Warfarin Pharmacogenomics Consortium (IWPC) dataset to show our approach’s clinical utility by deriving personalized warfarin dosing recommendations that integrate both genetic and clinical data, providing insights towards enhancing patient safety and therapeutic efficacy in anticoagulation therapy.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/jiang25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/jiang25a/jiang25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-jiang25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Muyan
    family: Jiang
  - given: Yunkai
    family: Zhang
  - given: Anil
    family: Aswani
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 691-707
  id: jiang25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 691
  lastpage: 707
  published: 2025-07-02 00:00:00 +0000
- title: 'The Latentverse: An Open-Source Benchmarking Toolkit for Evaluating Latent Representations'
  abstract: 'Self-supervised representation learning is a powerful approach for extracting meaningful features without relying on large amounts of labeled data, making it particularly valuable in fields like healthcare. This enables pretrained models to be shared and fine-tuned with minimal data for various downstream applications. However, evaluating the quality and behavior of these representations remains challenging. To address this, we introduce Latentverse, an open-source library and web-based platform for evaluating latent representations. Latentverse generates detailed reports with visualizations and metrics that provide a comprehensive perspective on different properties of representations, such as clustering, disentanglement, generalization, expressiveness, and robustness. It also allows for the comparison of different representations, enabling developers to refine model architectures and helping users assess how well an embedding model aligns with the requirements of their specific applications.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/turura25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/turura25a/turura25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-turura25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Yoanna
    family: Turura
  - given: Sam Freesun
    family: Friedman
  - given: Aurora
    family: Cremer
  - given: Mahnaz
    family: Maddah
  - given: Sana
    family: Tonekaboni
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 708-719
  id: turura25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 708
  lastpage: 719
  published: 2025-07-02 00:00:00 +0000
- title: 'How does my language model understand clinical text?'
  abstract: 'Large language models (LLMs) have performed well across various tasks in clinical natural language processing tasks, despite not being directly trained on electronic health record (EHR) data. In this work, we examine how popular open-source LLMs learn clinical information from large mined corpora through two crucial but understudied lenses: (1) their interpretation of clinical jargon, a foundational ability for understanding real-world clinical notes, and (2) their responses to medical misinformation. For both use cases, we investigate the frequency of relevant clinical information in their corresponding pretraining corpora, the relationship between pretraining data composition and model outputs, and the sources underlying this data. To isolate clinical jargon understanding, we evaluate LLMs on a new dataset MedLingo. Unsurprisingly, we find that the frequency of clinical jargon mentions across major pretraining corpora correlates with model performance. However, jargon frequently appearing in clinical notes often rarely appears in pretraining corpora, revealing a mismatch between available data and real-world usage. Similarly, we find that a non-negligible portion of documents support disputed claims that can then be parroted by models. Finally, we classified and analyzed the types of online sources in which clinical jargon and misinformation appear, with implications for future dataset composition.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/jia25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/jia25a/jia25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-jia25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Furong
    family: Jia
  - given: David
    family: Sontag
  - given: Monica
    family: Agrawal
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 720-743
  id: jia25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 720
  lastpage: 743
  published: 2025-07-02 00:00:00 +0000
- title: 'Multi-Objective Fine-Tuning of Clinical Scoring Tables: Adapting to Variations in Demography and Data'
  abstract: 'Clinical scoring tables (e.g., CURB-65 for pneumonia severity and mortality estimation) are widely used for estimating outcomes in healthcare, but their applicability is limited by i) demographic variations, ii) incomplete data availability of clinical variables, or iii) the need to incorporate data of new cohort-relevant clinical variables. We introduce a novel constrained multi-objective evolutionary machine learning (ML) optimization framework, SET (Scoring-table Evolutionary Tuning), that fine-tunes established clinical scoring tables to enhance performance while maintaining familiarity. SET works by iteratively making small constrained changes to the original table to improve performance across multiple metrics, while maintaining a similar structure, ensuring that minimal adjustments are made. This is in contrast to ML-based proposals that replace scoring tables with entirely new models or tables, which may encounter barriers to clinical adoption. Extensive evaluations across 8 established scoring tables and cohorts demonstrate that SET allows existing clinically-trusted scoring tables to adapt to variations in demography, enhancing performance. We also show that in situations with incomplete data availability of key clinical variables, SET can still augment scoring tables and perform competitively. Additionally, SET can also augment existing tables to incorporate new cohort-relevant features.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/fong25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/fong25a/fong25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-fong25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Kei Sen
    family: Fong
  - given: Mehul
    family: Motani
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 744-780
  id: fong25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 744
  lastpage: 780
  published: 2025-07-02 00:00:00 +0000
- title: 'MedMod: Multimodal Benchmark for Medical Prediction Tasks with Electronic Health Records and Chest X-Ray Scans'
  abstract: 'Multimodal machine learning provides a myriad of opportunities for developing models that integrate multiple modalities and mimic decision-making in the real-world, such as in medical settings. However, benchmarks involving multimodal medical data are scarce, especially routinely collected modalities such as Electronic Health Records (EHR) and Chest X-ray images (CXR). To contribute towards advancing multimodal learning in tackling real-world prediction tasks, we present MedMod, a multimodal medical benchmark with EHR and CXR using publicly available datasets MIMIC-IV and MIMIC-CXR, respectively. MedMod comprises five clinical prediction tasks: clinical conditions, in-hospital mortality, decompensation, length of stay, and radiological findings. We extensively evaluate several multimodal supervised learning models and self-supervised learning frameworks, making all of our code and models open-source.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/elsharief25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/elsharief25a/elsharief25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-elsharief25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Shaza
    family: Elsharief
  - given: Saeed
    family: Shurrab
  - given: Baraa Al
    family: Jorf
  - given: Leopoldo Julian Lechuga
    family: Lopez
  - given: Krzysztof J.
    family: Geras
  - given: Farah E.
    family: Shamout
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 781-803
  id: elsharief25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 781
  lastpage: 803
  published: 2025-07-02 00:00:00 +0000
- title: 'Transformer Model for Alzheimer’s Disease Progression Prediction Using  Longitudinal Visit Sequences'
  abstract: 'Alzheimer’s disease (AD) is a neurodegenerative disorder with no known cure that affects tens of millions of people worldwide. Early detection of AD is critical for timely intervention to halt or slow the progression of the disease. In this study, we propose a Transformer model for predicting the stage of AD progression at a subject’s next clinical visit using features from a sequence of visits extracted from the subject’s visit history. We also rigorously compare our model to recurrent neural networks (RNNs) such as long short-term memory (LSTM), gated recurrent unit (GRU), and minimalRNN and assess their performances based on factors such as the length of prior visits and data imbalance. We test the importance of different feature categories and visit history, as well as compare the model to a newer Transfomer-based model optimized for time series. Our model demonstrates strong predictive performance despite missing visits and missing features in available visits, particularly in identifying converter subjects–individuals transitioning to more severe disease stages–an area that has posed significant challenges in longitudinal prediction. The results highlight the model’s potential in enhancing early diagnosis and patient outcomes.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/moghaddami25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/moghaddami25a/moghaddami25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-moghaddami25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Mahdi
    family: Moghaddami
  - given: Clayton
    family: Schubring
  - given: Mohammad
    family: Siadat
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 804-816
  id: moghaddami25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 804
  lastpage: 816
  published: 2025-07-02 00:00:00 +0000
- title: 'LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records'
  abstract: 'Lab tests are fundamental for diagnosing diseases and monitoring patient conditions. However, frequent testing can be burdensome for patients, and test results may not always be immediately available. To address these challenges, we propose  LabTOP, a unified model that predicts lab test outcomes by leveraging autoregressive generative modeling approach on EHR data. Unlike conventional methods that estimate only a subset of lab tests or classify discrete value ranges, LabTOP performs continuous numerical predictions for a diverse range of lab items. We evaluate LabTOP on three publicly available EHR datasets, and demonstrate that it outperforms existing methods, including traditional machine learning models and state-of-the-art large language models. We also conduct extensive ablation studies to confirm the effectiveness of our design choices. We believe that LabTOP will serve as an accurate and generalizable framework for lab test outcome prediction, with potential applications in clinical decision support and early detection of critical conditions.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/im25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/im25a/im25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-im25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Sujeong
    family: Im
  - given: Jungwoo
    family: Oh
  - given: Edward
    family: Choi
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 817-843
  id: im25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 817
  lastpage: 843
  published: 2025-07-02 00:00:00 +0000
- title: 'A Study of Artifacts on Melanoma Classification under Diffusion-Based Perturbations'
  abstract: 'In melanoma classification, deep learning models have been shown to rely on non-medical artifacts (e.g., surgical markings) rather than clinically relevant features (e.g., lesion asymmetry), compromising their generalizability. In this work, we investigate the impact of artifacts on melanoma classification under two settings: (1) input disruptions, such as bounding boxes and frequency-based filtering, which isolate artifacts by region or frequency, and (2) a novel diffusion-based perturbation method that selectively introduces isolated artifacts into images, generating controlled pairs for direct comparison. We systematically analyze artifact biases in three benchmark datasets: ISIC 2018, HAM10000, and PH2. Our findings reveal that whole-image training outperforms lesion-only or background-only approaches, low-frequency features are essential for melanoma prediction, and classifiers are more sensitive to perturbations for the artifacts of ink markings, rulers, and patches. These results emphasize the need for systematic artifact assessment and provide insights for improving the robustness of melanoma classification models.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/jin25b.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/jin25b/jin25b.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-jin25b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Qixuan
    family: Jin
  - given: Marzyeh
    family: Ghassemi
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 844-861
  id: jin25b
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 844
  lastpage: 861
  published: 2025-07-02 00:00:00 +0000
- title: 'Uncertainty Quantification for Machine Learning in Healthcare: A Survey'
  abstract: 'Uncertainty Quantification (UQ) is pivotal in enhancing the robustness, reliability, and interpretability of Machine Learning (ML) systems for healthcare, optimizing resources and improving patient care. Despite the emergence of ML-based clinical decision support tools, the lack of principled quantification of uncertainty in ML models remains a major challenge. Current reviews have a narrow focus on analyzing the state-of-the-art UQ in specific healthcare domains without systematically evaluating method efficacy across different stages of model development, and despite a growing body of research, its implementation in healthcare applications remains limited. Therefore, in this survey, we provide a comprehensive analysis of current UQ in healthcare, offering an informed framework that highlights how different methods can be integrated into each stage of the ML pipeline including data processing, training and evaluation. We also highlight the most popular methods used in healthcare and novel approaches from other domains that hold potential for future adoption in the medical context. We expect this study will provide a clear overview of the challenges and opportunities of implementing UQ in the ML pipeline for healthcare, guiding researchers and practitioners in selecting suitable techniques to enhance the reliability, safety and trust from patients and clinicians on ML-driven healthcare solutions.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/lopez25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/lopez25a/lopez25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-lopez25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Leopoldo Julian Lechuga
    family: Lopez
  - given: Shaza
    family: Elsharief
  - given: Dhiyaa Al
    family: Jorf
  - given: Firas
    family: Darwish
  - given: Congbo
    family: Ma
  - given: Farah E.
    family: Shamout
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 862-907
  id: lopez25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 862
  lastpage: 907
  published: 2025-07-02 00:00:00 +0000
- title: 'Self-Explaining Hypergraph Neural Networks for Diagnosis Prediction'
  abstract: 'The burgeoning volume of electronic health records (EHRs) has enabled deep learning models to excel in predictive healthcare. However, for high-stakes applications such as diagnosis prediction, model interpretability remains paramount. Existing deep learning diagnosis prediction models with intrinsic interpretability often assign attention weights to every past diagnosis or hospital visit, providing explanations lacking flexibility and succinctness. In this paper, we introduce SHy, a self-explaining hypergraph neural network model, designed to offer personalized, concise and faithful explanations that allow for interventions from clinical experts. By modeling each patient as a unique hypergraph and employing a message-passing mechanism, SHy captures higher-order disease interactions and extracts distinct temporal phenotypes as personalized explanations. It also addresses the incompleteness of the EHR data by accounting for essential false negatives in the original diagnosis record. A qualitative case study and extensive quantitative evaluations on two real-world EHR datasets demonstrate the superior predictive performance and interpretability of SHy over existing state-of-the-art models.'
  volume: 287
  URL: https://proceedings.mlr.press/v287/yu25a.html
  PDF: https://raw.githubusercontent.com/mlresearch/v287/main/assets/yu25a/yu25a.pdf
  edit: https://github.com/mlresearch//v287/edit/gh-pages/_posts/2025-07-02-yu25a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the sixth Conference on Health, Inference, and Learning'
  publisher: 'PMLR'
  author: 
  - given: Leisheng
    family: Yu
  - given: Yanxiao
    family: Cai
  - given: Minxing
    family: Zhang
  - given: Xia
    family: Hu
  editor: 
  - given: Xuhai Orson
    family: Xu
  - given: Edward
    family: Choi
  - given: Pankhuri
    family: Singhal
  - given: Walter
    family: Gerych
  - given: Shengpu
    family: Tang
  - given: Monica
    family: Agrawal
  - given: Adarsh
    family: Subbaswamy
  - given: Elena
    family: Sizikova
  - given: Jessilyn
    family: Dunn
  - given: Roxana
    family: Daneshjou
  - given: Tasmie
    family: Sarker
  - given: Matthew
    family: McDermott
  - given: Irene
    family: Chen
  page: 908-924
  id: yu25a
  issued:
    date-parts: 
      - 2025
      - 7
      - 2
  firstpage: 908
  lastpage: 924
  published: 2025-07-02 00:00:00 +0000