Proceedings of Machine Learning Research

Proceedings of Machine Learning Research Proceedings of the 3rd Machine Learning for Health Symposium Held in New Orleans, Louisiana, USA on 10 December 2023 Published as Volume 225 by the Proceedings of Machine Learning Research on 04 December 2023. Volume Edited by: Stefan Hegselmann Antonio Parziale Divya Shanmugam Shengpu Tang Mercy Nyamewaa Asiedu Serina Chang Tom Hartvigsen Harvineet Singh Series Editors: Neil D. Lawrence https://proceedings.mlr.press/v225/ Fri, 08 Dec 2023 23:38:57 +0000 Fri, 08 Dec 2023 23:38:57 +0000 Jekyll v3.9.3 Diffusion Model-Based Data Augmentation for Lung Ultrasound Classification with Limited Data Deep learning models typically require large quantities of data for good generalization. However, acquiring labeled medical imaging data is expensive, particularly for rare pathologies. While standard data augmentation is routinely performed to improve data variety, it may not be sufficient to improve the performance of downstream tasks with a clinical diagnostic purpose. Here we investigate the applicability of SinDDM kulikov2023sinddm , a single-image denoising diffusion model, for medical image data augmentation with lung ultrasound (LUS) images. Qualitative and quantitative evaluation of perceptual quality of the generated images were conducted. A multi-class classification task to detect various pathologies from LUS images was also employed to demonstrate the effectiveness of synthetic data augmentation using SinDDM. We further evaluated the image generation performance of FewDDM, an extended version of SinDDM trained on a limited number of images instead of a single image. Our results show that both SinDDM and FewDDM are able to generate images superior in quality compared to single-image generative adversarial networks (GANs), and are also highly effective in augmenting medical imaging data with limited number of samples to improve downstream task performance. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/zhang23a.html https://proceedings.mlr.press/v225/zhang23a.html Zero-Shot ECG Diagnosis with Large Language Models and Retrieval-Augmented Generation Recently, Large Language Models (LLMs) have become essential players in the deep learning domain. While their capabilities are evident across various textual tasks, this study aims to bridge the gap and explore the potential of leveraging LLMs in diagnosing cardiac diseases and sleep apnea from Electrocardiography (ECG). Earlier work touched on converting ECG signals into text for LLMs, but a comprehensive LLM-based approach for dealing with more complicated symptoms remains relatively unexplored. To investigate the ECG diagnosis with an LLM-based approach, our research introduces a zero-shot retrieval-augmented diagnosis technique. We have built databases filled with specific domain knowledge for cardiac symptom and sleep apnea diagnosis, which encourages the LLMs from merely relying on the inherent LLM knowledge to a more holistic pipeline from carefully crafting prompts and infusing expert knowledge to guide LLMs. We evaluate the proposed approach on two datasets for diagnosing arrhythmia and sleep apnea, respectively. The evaluation results indicate that our zero-shot approach not only surpasses previous few-shot LLM-based methods but is also competitive with supervised learning techniques fully trained on extensive datasets. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/yu23b.html https://proceedings.mlr.press/v225/yu23b.html Dynamic Interpretable Change Point Detection for Physiological Data Analysis Identifying change points (CPs) in time series is crucial to guide better decision-making in healthcare, and facilitating timely responses to potential risks or opportunities. In maternal health, monitoring health signals in pregnant women allows healthcare providers to promptly respond to complications like preeclampsia or enhance delivery time detection, improving overall maternal care. Existing Change Point Detection (CPD) methods often fail to generalize effectively due to diverse underlying changes that can cause a CP. We propose Ti me Va rying CPD (TiVaCPD), a change point detection method that captures different types of changes in the underlying distribution of multidimensional data. It combines a dynamic window MMD test with a graphical Lasso estimator of feature covariance to measure both changes in the joint distribution of the observations as well as changes in feature dynamics. TiVaCPD generates a unifying CP score by evaluating the relative similarity of the statistical tests. Additionally, TiVaCPD score enhances interpretability by offering insight into the underlying causes of CPs through a detailed analysis of feature dynamics, which is especially valuable in healthcare applications. We evaluate the performance of TiVaCPD on both simulated and real-world data, showing that it can outperform state-of-the-art methods. We further demonstrate the appliance of TiVaCPD in a pregnancy-related case study, showcasing the joint shifts in physiological signals that facilitate the detection of delivery time. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/yu23a.html https://proceedings.mlr.press/v225/yu23a.html TransEHR: Self-Supervised Transformer for Clinical Time Series Data Deep neural networks, including the Transformer architecture, have achieved remarkable performance in various time series tasks. However, their effectiveness in handling clinical time series data is hindered by specific challenges: 1) Sparse event sequences collected asynchronously with multivariate time series, and 2) Limited availability of labeled data. To address these challenges, we propose Our code is available at https://github.com/SigmaTsing/TransEHR.git . , a self-supervised Transformer model designed to encode multi-sourced asynchronous sequential data, such as structured Electronic Health Records (EHRs), efficiently. We introduce three pretext tasks for pre-training the Transformer model, utilizing large amounts of unlabeled structured EHR data, followed by fine-tuning on downstream prediction tasks using the limited labeled data. Through extensive experiments on three real-world health datasets, we demonstrate that our model achieves state-of-the-art performance on benchmark clinical tasks, including in-hospital mortality classification, phenotyping, and length-of-stay prediction. Our findings highlight the efficacy of in effectively addressing the challenges associated with clinical time series data, thus contributing to advancements in healthcare analytics. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/xu23a.html https://proceedings.mlr.press/v225/xu23a.html Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose a hybrid variational autoencoder to learn interpretable representations of CGM and meal data. Our method grounds the latent space to the inputs of a mechanistic differential equation, producing embeddings that reflect physiological quantities, such as insulin sensitivity, glucose effectiveness, and basal glucose levels. Moreover, we introduce a novel method to infer the glucose appearance rate, making the mechanistic model robust to unreliable meal logs. On a dataset of CGM and self-reported meals from individuals with type-2 diabetes and pre-diabetes, our unsupervised representation discovers a separation between individuals proportional to their disease severity. Our embeddings produce clusters that are up to 4x better than naive, expert, black-box, and pure mechanistic features. Our method provides a nuanced, yet interpretable, embedding space to compare glycemic control within and across individuals, directly learnable from in-the-wild data. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/wang23a.html https://proceedings.mlr.press/v225/wang23a.html GANcMRI: Cardiac magnetic resonance video generation and physiologic guidance using latent space prompting Generative artificial intelligence can be applied to medical imaging on tasks such as privacy-preserving image generation and super-resolution and denoising of existing images. Few prior approaches have used cardiac magnetic resonance imaging (cMRI) as a modality given the complexity of videos (the addition of the temporal dimension) as well as the limited scale of publicly available datasets. We introduce GANcMRI, a generative adversarial network that can synthesize cMRI videos with physiological guidance based on latent space prompting. GANcMRI uses a StyleGAN framework to learn the latent space from individual video frames and leverages the time-dependent trajectory between end-systolic and end-diastolic frames in the latent space to predict progression and generate motion over time. We proposed various methods for modeling latent time-dependent trajectories and found that our Frame-to-frame approach generates the best motion and video quality. GANcMRI generated high-quality cMRI image frames that are indistinguishable by cardiologists, however, artifacts in video generation allow cardiologists to still recognize the difference between real and generated videos. The generated cMRI videos can be prompted to apply physiology-based adjustments which produces clinically relevant phenotypes recognizable by cardiologists. GANcMRI has many potential applications such as data augmentation, education, anomaly detection, and preoperative planning. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/vukadinovic23a.html https://proceedings.mlr.press/v225/vukadinovic23a.html Interpretable Survival Analysis for Heart Failure Risk Prediction Survival analysis, or time-to-event analysis, is an important and widespread problem in healthcare research. Medical research has traditionally relied on Cox models for survival analysis, due to their simplicity and interpretability. Cox models assume a log-linear hazard function as well as proportional hazards over time, and can perform poorly when these assumptions fail. Newer survival models based on machine learning avoid these assumptions and offer improved accuracy, yet sometimes at the expense of model interpretability, which is vital for clinical use. We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models. Specifically, we use an improved version of survival stacking to transform a survival analysis problem to a classification problem, ControlBurn to perform feature selection, and Explainable Boosting Machines to generate interpretable predictions. To evaluate our pipeline, we predict risk of heart failure using a large-scale EHR database. Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/van-ness23a.html https://proceedings.mlr.press/v225/van-ness23a.html Curriculum Self-Supervised Learning for 3D CT Cardiac Image Segmentation Automating the segmentation of various cardiac chamber structures (e.g., pulmonary artery, aorta, etc.) in 3D CT cardiac imaging remains a significant challenge. This challenge primarily arises from the dynamic nature of the human heart and substantial anatomical variations in terms of organ texture, shape, and size across different patients. These factors collectively result in a scarcity of annotated data, posing a significant hurdle for training data-hungry deep models. The self-supervised learning (SSL) paradigm offers a promising solution to overcome this obstacle since it eliminates the reliance on massive annotated data for training deep models. However, existing SSL approaches fall short in capturing effective representations from 3D cardiac volumes due to the oversight of the dynamic nature of human hearts in the design of their pretext tasks. To address this challenge, we propose a novel SSL method based on the curriculum learning paradigm, which progressively increases the task difficulty during the pretraining stages. Our method enables the SSL model to initially acquire fundamental knowledge about the data, which can subsequently serve as valuable contextual clues for solving more complex tasks during later stages of pretraining. Our extensive experiments demonstrate that the SSL pre-trained model, trained using our strategy, acquires generalizable representations capable of effectively segmenting various existing cardiac chamber structures. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/taher23a.html https://proceedings.mlr.press/v225/taher23a.html Eigen: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images Information Extraction (IE) from document images is challenging due to the high variability of layout formats. Deep models such as etc . In this work, we propose a novel approach, EIGEN (Expert-Informed Joint Learning aGgrEatioN), which combines rule-based methods with deep learning models using data programming approaches to circumvent the requirement of annotation of large amounts of training data. Specifically, consolidates weak labels induced from multiple heuristics through generative models and use them along with a small number of annotated labels to jointly train a deep model. In our framework, we propose the use of labeling functions that include incorporating contextual information thus capturing the visual and language context of a word for accurate categorization. We empirically show that our framework can significantly improve the performance of state-of-the-art deep models with the availability of very few labeled data instances Source code is available at https://github.com/ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Images . Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/singh23a.html https://proceedings.mlr.press/v225/singh23a.html LymphoML: An interpretable artificial intelligence-based method identifies morphologic features that correlate with lymphoma subtype The accurate classification of lymphoma subtypes using hematoxylin and eosin (H {\}& E)-stained tissue is complicated by the wide range of morphological features these cancers can exhibit. We present LymphoML - an interpretable machine learning method that identifies morphologic features that correlate with lymphoma subtypes. Our method applies steps to process H {\}& E-stained tissue microarray cores, segment nuclei and cells, compute features encompassing morphology, texture, and architecture, and train gradient-boosted models to make diagnostic predictions. LymphoML{’}s interpretable models, developed on a limited volume of H {\}& E-stained tissue, achieve non-inferior diagnostic accuracy to pathologists using whole-slide images and outperform black box deep-learning on a dataset of 670 cases from Guatemala spanning 8 lymphoma subtypes. Using SHapley Additive exPlanation (SHAP) analysis, we assess the impact of each feature on model prediction and find that nuclear shape features are most discriminative for DLBCL (F1-score: 78.7 {\}% ) and classical Hodgkin lymphoma (F1-score: 74.5 {\}% ). Finally, we provide the first demonstration that a model combining features from H {\}& E-stained tissue with features from a standardized panel of 6 immunostains results in a similar diagnostic accuracy (85.3 {\}% ) to a 46-stain panel (86.1 {\}% ). Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/shankar23a.html https://proceedings.mlr.press/v225/shankar23a.html Robust semi-supervised segmentation with timestep ensembling diffusion models Medical image segmentation is a challenging task, made more difficult by many datasets’ limited size and annotations. Denoising diffusion probabilistic models (DDPM) have recently shown promise in modelling the distribution of natural images and were successfully applied to various medical imaging tasks. This work focuses on semi-supervised image segmentation using diffusion models, particularly addressing domain generalisation. Firstly, we demonstrate that smaller diffusion steps generate latent representations that are more robust for downstream tasks than larger steps. Secondly, we use this insight to propose an improved ensembling scheme that leverages information-dense small steps and the regularising effect of larger steps to generate predictions. Our model shows significantly better performance in domain-shifted settings while retaining competitive performance in-domain. Overall, this work highlights the potential of DDPMs for semi-supervised medical image segmentation and provides insights into optimising their performance under domain shift. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/rosnati23a.html https://proceedings.mlr.press/v225/rosnati23a.html MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning for Computational Phenotyping Tensor factorization has received increasing interest due to its intrinsic ability to capture latent factors in multi-dimensional data with many applications including Electronic Health Records (EHR) mining. PARAFAC2 and its variants have been proposed to address irregular tensors where one of the tensor modes is not aligned, e.g., different patients in EHRs may have different length of records. PARAFAC2 has been successfully applied to EHRs for extracting meaningful medical concepts (phenotypes). Despite recent advancements, current models’ predictability and interpretability are not satisfactory, which limits its utility for downstream analysis. In this paper, we propose MULTIPAR: a supervised irregular tensor factorization with multi-task learning for computational phenotyping. MULTIPAR is flexible to incorporate both static (e.g. in-hospital mortality prediction) and continuous or dynamic (e.g. the need for ventilation) tasks. By supervising the tensor factorization with downstream prediction tasks and leveraging information from multiple related predictive tasks, MULTIPAR can yield not only more meaningful phenotypes but also better predictive performance for downstream tasks. We conduct extensive experiments on two real-world temporal EHR datasets to demonstrate that MULTIPAR is scalable and achieves better tensor fit with more meaningful subgroups and stronger predictive performance compared to existing state-of-the-art methods. The implementation of MULTIPAR is available https://github.com/yifeiren13/MULTIPAR . Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/ren23a.html https://proceedings.mlr.press/v225/ren23a.html Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest, most current studies focus solely on classification or regression tasks which overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. Rather than treating ECG diagnosis as a classification or regression task, we propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Also, since interpreting ECG as images is more affordable and accessible, we process ECG as encoded images and adopt a vision-language learning paradigm to jointly learn vision-language alignment between encoded ECG images and ECG diagnosis reports. Encoding ECG into images can result in an efficient ECG retrieval system, which will be highly practical and useful in clinical applications. More importantly, our findings could serve as a crucial resource for providing diagnostic services in underdevelopment regions. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/qiu23a.html https://proceedings.mlr.press/v225/qiu23a.html Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series Analysis of multivariate healthcare time series data is inherently challenging: irregular sampling, noisy and missing values, and heterogeneous patient groups with different dynamics violating exchangeability. In addition, interpretability and quantification of uncertainty are critically important. Here, we propose a novel class of models, a mixture of coupled hidden Markov models (M-CHMM), and demonstrate how it elegantly overcomes these challenges. To make the model learning feasible, we derive two algorithms to sample the sequences of the latent variables in the CHMM: samplers based on (i) particle filtering and (ii) factorized approximation. Compared to existing inference methods, our algorithms are computationally tractable, improve mixing, and allow for likelihood estimation, which is necessary to learn the mixture model. Experiments on challenging real-world epidemiological and semi-synthetic data demonstrate the advantages of the M-CHMM: improved data fit, capacity to efficiently handle missing and noisy measurements, improved prediction accuracy, and ability to identify interpretable subsets in the data. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/poyraz23a.html https://proceedings.mlr.press/v225/poyraz23a.html Using Reinforcement Learning for Multi-Objective Cluster-Level Optimization of Non-Pharmaceutical Interventions for Infectious Disease In the early stages of an infectious disease crisis, non-pharmaceutical interventions (NPIs) such as quarantines and testing can play an important role. Optimizing the delivery of NPIs is challenging as they can impose substantial direct costs (e.g., test costs) and human impacts (e.g., quarantine of uninfected individuals) and can be especially difficult to target for infections that may spread pre- or asymptomatically. %and infections may spread pre- or asymptomatically, leading to a multi-objective, partially observable problem. In addition, superspreading, a common characteristic of many infectious diseases, induces informational dependencies across a cluster (group of individuals exposed by the same seed case). We formulate NPI optimization as a partially observable Markov decision process (POMDP), which we aim to solve with reinforcement learning (RL). We find RL provides a promising technical foundation that even modern approaches struggle. We propose a novel RL approach that leverages a supervised learning decoder as well as permutation invariant, fixed-size observation representations. Through extensive experimentation and evaluation, we show that our optimized policy can outperform all benchmarks by up to 27 {\}% . %Our model can achieve up 60{\}% and 77{\}% improvement compared with non-action policy and CDC policy, respectively. Additionally, we show that the policies discovered by RL can be distilled into decision trees to simplify deployment while still achieving strong performance. We publicly release our code and RL environments at: {~} https://github.com/XueqiaoPeng/Covid-RLSL Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/peng23a.html https://proceedings.mlr.press/v225/peng23a.html Nonparametric modeling of the composite effect of multiple nutrients on blood glucose dynamics In biomedical applications it is often necessary to estimate a physiological response to a treatment consisting of multiple components, and learn the separate effects of the components in addition to the joint effect. Here, we extend existing probabilistic nonparametric approaches to explicitly address this problem. We also develop a new convolution-based model for composite treatment–response curves that is more biologically interpretable. We validate our models by estimating the impact of carbohydrate and fat in meals on blood glucose. By differentiating treatment components, incorporating their dosages, and sharing statistical information across patients via a hierarchical multi-output Gaussian process, our method improves prediction accuracy over existing approaches, and allows us to interpret the different effects of carbohydrates and fat on the overall glucose response. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/odnoblyudova23a.html https://proceedings.mlr.press/v225/odnoblyudova23a.html Temporal Supervised Contrastive Learning for Modeling Patient Risk Progression We consider the problem of predicting how the likelihood of an outcome of interest for a patient changes over time as we observe more of the patient’s data. To solve this problem, we propose a supervised contrastive learning framework that learns an embedding representation for each time step of a patient time series. Our framework learns the embedding space to have the following properties: (1){~}nearby points in the embedding space have similar predicted class probabilities, (2){~}adjacent time steps of the same time series map to nearby points in the embedding space, and (3){~}time steps with very different raw feature vectors map to far apart regions of the embedding space. To achieve property (3), we employ a nearest neighbor pairing mechanism in the raw feature space. This mechanism also serves as an alternative to “data augmentation”, a key ingredient of contrastive learning, which lacks a standard procedure that is adequately realistic for clinical tabular data, to our knowledge. We demonstrate that our approach outperforms state-of-the-art baselines in predicting mortality of septic patients (MIMIC-III dataset) and tracking progression of cognitive impairment (ADNI dataset). Our method also consistently recovers the correct synthetic dataset embedding structure across experiments, a feat not achieved by baselines. Our ablation experiments show the pivotal role of our nearest neighbor pairing. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/noroozizadeh23a.html https://proceedings.mlr.press/v225/noroozizadeh23a.html Pragmatic Radiology Report Generation When pneumonia is not found on a chest X-ray, should the report describe this negative observation or omit it? We argue that this question cannot be answered from the X-ray alone and requires a pragmatic perspective, which captures the communicative goal that radiology reports serve between radiologists and patients. However, the standard image-to-text formulation for radiology report generation fails to incorporate such pragmatic intents. Following this pragmatic perspective, we demonstrate that the indication, which describes why a patient comes for an X-ray, drives the mentions of negative observations. We thus introduce indications as additional input to report generation. With respect to the output, we develop a framework to identify uninferable information from the image, which could be a source of model hallucinations, and limit them by cleaning groundtruth reports. Finally, we use indications and cleaned groundtruth reports to develop pragmatic models, and show that they outperform existing methods not only in new pragmatics-inspired metrics (e.g., +4.3 Negative F1) but also in standard metrics (e.g., +6.3 Positive F1 and +11.0 BLEU-2). Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/nguyen23a.html https://proceedings.mlr.press/v225/nguyen23a.html Supervised Electrocardiogram(ECG) Features Outperform Knowledge-based And Unsupervised Features In Individualized Survival Prediction An electrocardiogram (ECG) provides crucial information about an individual’s health status. Researchers utilize ECG data to develop learners for a variety of tasks, ranging from diagnosing ECG abnormalities to estimating time to death – here modeled as individual survival distributions (ISDs). The way the ECG is represented is important for creating an effective learner. While many traditional ECG-based prediction models rely on hand-crafted features, such as heart rate, this study aims to achieve a better representation. The effectiveness of various ECG based feature extraction methods for prediction of ISDs, either supervised or unsupervised, have not been explored previously. The study uses a large ECG dataset from 244,077 patients with over 1.6 million 12-lead ECGs, each labeled with the patient{’}s disease {–} one or more International Classification of Diseases (ICD) codes. We explored extracting high-level features from ECG traces using various approaches, then trained models that used these ECG features (along with age and sex), across a range of training sizes, to estimate patient-specific ISDs. The results showed that the supervised feature extractor method produced ECG features that can estimate ISD curves better than ECG features obtained from unsupervised or knowledge-based methods. Supervised ECG features required fewer training instances (as low as 500) to learn ISD models that performed better than the baseline model that only used age and sex. On the other hand, unsupervised and knowledge-based ECG features required over 5,000 training samples to produce ISD models that performed better than the baseline. The study’s findings may assist researchers in selecting the most appropriate approach for extracting high-level features from ECG signals to estimate patient-specific ISD curves. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/nademi23a.html https://proceedings.mlr.press/v225/nademi23a.html Med-Flamingo: a Multimodal Medical Few-shot Learner Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models{~}(VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering{~}(VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20 {\}% in clinician’s rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app. %under{~}{\}url\{https://github.com/snap-stanford/med-flamingo\}. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/moor23a.html https://proceedings.mlr.press/v225/moor23a.html Designing and evaluating an online reinforcement learning agent for physical exercise recommendations in N-of-1 trials Personalized adaptive interventions offer the opportunity to increase patient benefits, however, there are challenges in their planning and implementation. Once implemented, it is an important question whether personalized adaptive interventions are indeed clinically more effective compared to a fixed gold standard intervention. In this paper, we present an innovative N-of-1 trial study design testing whether implementing a personalized intervention by an online reinforcement learning agent is feasible and effective. Throughout, we use a new study on physical exercise recommendations to reduce pain in endometriosis for illustration. We describe the design of a contextual bandit recommendation agent and evaluate the agent in simulation studies. The results show that, first, implementing a personalized intervention by an online reinforcement learning agent is feasible. Second, such adaptive interventions have the potential to improve patients’ benefits even if only few observations are available. As one challenge, they add complexity to the design and implementation process. In order to quantify the expected benefit, data from previous interventional studies is required. We expect our approach to be transferable to other interventions and clinical interventions. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/meier23a.html https://proceedings.mlr.press/v225/meier23a.html Compositional Q-learning for electrolyte repletion with imbalanced patient sub-populations Reinforcement learning (RL) is an effective framework for solving sequential decision-making tasks. However, applying RL methods in medical care settings is challenging in part due to heterogeneity in treatment response among patients. Some patients can be treated with standard protocols whereas others, such as those with chronic diseases, need personalized treatment planning. Traditional RL methods often fail to account for this heterogeneity, because they assume that all patients respond to the treatment in the same way (i.e., transition dynamics are shared). We introduce Compositional Fitted Q -iteration (CFQI), which uses a compositional task structure to represent heterogeneous treatment responses in medical care settings. A compositional task consists of several variations of the same task, each progressing in difficulty; solving simpler variants of the task can enable efficient solving of harder variants. CFQI uses a compositional Q -value function with separate modules for each task variant, allowing it to take advantage of shared knowledge while learning distinct policies for each variant. We validate CFQI’s performance using a Cartpole environment and use CFQI to recommend electrolyte repletion for patients with and without renal disease. Our results demonstrate that CFQI is robust even in the presence of class imbalance, enabling effective information usage across patient sub-populations. CFQI exhibits great promise for clinical applications in scenarios characterized by known compositional structures. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/mandyam23a.html https://proceedings.mlr.press/v225/mandyam23a.html Anytime-valid inference in N-of-1 trials App-based N-of-1 trials offer a scalable experimental design for assessing the effects of health interventions at an individual level. Their practical success depends on the strong motivation of participants, which, in turn, translates into high adherence and reduced loss to follow-up. One way to maintain participant engagement is by sharing their interim results. Continuously testing hypotheses during a trial, known as “peeking”, can also lead to shorter, lower-risk trials by detecting strong effects early. Nevertheless, traditionally, results are only presented upon the trial’s conclusion. In this work, we introduce a potential outcomes framework that permits interim peeking of the results and enables statistically valid inferences to be drawn at any point during N-of-1 trials. Our work builds on the growing literature on valid confidence sequences , which enables anytime-valid inference with uniform type-1 error guarantees over time. We propose several causal estimands for treatment effects applicable in an N-of-1 trial and demonstrate, through empirical evaluation, that the proposed approach results in valid confidence sequences over time. We anticipate that incorporating anytime-valid inference into clinical trials can significantly enhance trial participation and empower participants. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/malenica23a.html https://proceedings.mlr.press/v225/malenica23a.html Gradient-Map-Guided Adaptive Domain Generalization for Cross Modality MRI Segmentation Cross-modal MRI segmentation is of great value for computer-aided medical diagnosis, enabling flexible data acquisition and model generalization. However, most existing methods have difficulty in handling local variations in domain shift and typically require a significant amount of data for training, which hinders their usage in practice. To address these problems, we propose a novel adaptive domain generalization framework, which integrates a learning-free cross-domain representation based on image gradient maps and a class prior-informed test-time adaptation strategy for mitigating local domain shift. We validate our approach on two multi-modal MRI datasets with six cross-modal segmentation tasks. Across all the task settings, our method consistently outperforms competing approaches and shows a stable performance even with limited training data. Our Codes are available now at https://github.com/cuttle-fish-my/GM-Guided-DG . Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/li23a.html https://proceedings.mlr.press/v225/li23a.html On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series Recent advances in deep learning architectures for sequence modeling have not fully transferred to tasks handling time-series from electronic health records. In particular, in problems related to the Intensive Care Unit (ICU), the state-of-the-art remains to tackle sequence classification in a tabular manner with tree-based methods. Recent findings in deep learning for tabular data are now surpassing these classical methods by better handling the severe heterogeneity of data input features. Given the similar level of feature heterogeneity exhibited by ICU time-series and motivated by these findings, we explore these novel methods’ impact on clinical sequence modeling tasks. By jointly using such advances in deep learning for tabular data, our primary objective is to underscore the importance of step-wise embeddings in time-series modeling, which remain unexplored in machine learning methods for clinical data. On a variety of clinically relevant tasks from two large-scale ICU datasets, MIMIC-III and HiRID, our work provides an exhaustive analysis of state-of-the-art methods for tabular time-series as time-step embedding models, showing overall performance improvement. In particular, we evidence the importance of feature grouping in clinical time-series, with significant performance gains when considering features within predefined semantic groups in the step-wise embedding module. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/kuznetsova23a.html https://proceedings.mlr.press/v225/kuznetsova23a.html Deep Multimodal Fusion for Surgical Feedback Classification Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In this work, we leverage a clinically-validated five-category classification of surgical feedback: “Anatomic” , “Technical” , “Procedural” , “Praise” and “Visual Aid” . We then develop a multi-label machine learning model to classify these five categories of surgical feedback from inputs of text, audio, and video modalities. The ultimate goal of our work is to help automate the annotation of real-time contextual surgical feedback at scale. Our automated classification of surgical feedback achieves AUCs ranging from 71.5 to 77.6 with the fusion improving performance by 3.1 {\}% . We also show that high-quality manual transcriptions of feedback audio from experts improve AUCs to between 76.5 and 96.2, which demonstrates a clear path toward future improvements. Empirically, we find that the Staged training strategy, with first pre-training each modality separately and then training them jointly, is more effective than training different modalities altogether. We also present intuitive findings on the importance of modalities for different feedback categories. This work offers an important first look at the feasibility of automated classification of real-world live surgical feedback based on text, audio, and video modalities. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/kocielnik23a.html https://proceedings.mlr.press/v225/kocielnik23a.html Multimodal Pretraining of Medical Time Series and Notes Within the intensive care unit (ICU), a wealth of patient data, including clinical measurements and clinical notes, is readily available. This data is a valuable resource for comprehending patient health and informing medical decisions, but it also contains many challenges in analysis. Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data, a challenge in critical care. To address this, we propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes. Our approach combines contrastive and masked token prediction tasks during pretraining. Semi-supervised experiments on the MIMIC-III dataset demonstrate the effectiveness of our self-supervised pretraining. In downstream tasks, including in-hospital mortality prediction and phenotyping, our pretrained model outperforms baselines in settings where only a fraction of the data is labeled, emphasizing its ability to enhance ICU data analysis. Notably, our method excels in situations where very few labels are available, as evidenced by an increase in the AUC-ROC for in-hospital mortality by 0.17 and in AUC-PR for phenotyping by 0.1 when only 1 {\}% of labels are accessible. This work advances self-supervised learning in the healthcare domain, optimizing clinical insights from abundant yet challenging ICU data. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/king23a.html https://proceedings.mlr.press/v225/king23a.html Learning Generalized Medical Image Representations Through Image-Graph Contrastive Pretraining Medical image interpretation using deep learning has shown promise but often requires extensive expert-annotated datasets. To reduce this annotation burden, we develop an Image-Graph Contrastive Learning framework that pairs chest X-rays with structured report knowledge graphs automatically extracted from radiology notes. Our approach uniquely encodes the disconnected graph components via a relational graph convolution network and transformer attention. In experiments on the CheXpert dataset, this novel graph encoding strategy enabled the framework to outperform existing methods that use image-text contrastive learning in 1 {\}% linear evaluation and few-shot settings, while achieving comparable performance to radiologists. By exploiting unlabeled paired images and text, our framework demonstrates the potential of structured clinical insights to enhance contrastive learning for medical images. This work points toward reducing demands on medical experts for annotations, improving diagnostic precision, and advancing patient care through robust medical image understanding. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/khanna23a.html https://proceedings.mlr.press/v225/khanna23a.html How Fair are Medical Imaging Foundation Models? While medical imaging foundation models have led to significant improvements across various tasks, the pivotal issue of subgroup fairness in these foundation models has remained largely unexplored. Our work bridges this research gap by presenting the first comprehensive study analyzing the subgroup fairness of six diverse foundation models, encompassing various pre-training methods, sources of pre-training data, and model architectures. In doing so, we discover a concerning trade-off: foundation models pre-trained on medical images achieve better overall performance but are consistently less fair than those pre-trained on natural images, with sometimes even worse fairness than baseline models trained from scratch. To mitigate these fairness disparities, we show that augmenting both the volume of pre-training data as well as the number of pre-training epochs, enhances subgroup fairness of medical imaging pre-trained models. Furthermore, to decouple the fairness bias from the pre-training and fine-tuning stages, we employ balanced datasets for fine-tuning. While fine-tuning on balanced datasets partially mitigates fairness issues, it is insufficient to completely eliminate the biases from the pre-training stage, prompting the need for careful design and evaluation of medical imaging foundation models. Our granular analysis reveals that medical imaging pre-trained models tend to favor majority racial subgroups (White, Asian) whereas natural imaging pre-trained models tend to favor minority racial subgroups (Black). Additionally, across all foundation models, we observe a consistent underperformance on the female patients cohort. As the community moves towards designing specialized foundation models for medical imaging, we hope our timely research provides crucial insights to help inform more equitable model development. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/khan23a.html https://proceedings.mlr.press/v225/khan23a.html NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patient’s status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/kailas23a.html https://proceedings.mlr.press/v225/kailas23a.html Activation From Sparse 2D Cardiac MRIs Identifying regions of late mechanical activation (LMA) of the left ventricular (LV) myocardium is critical in determining the optimal pacing site for cardiac resynchronization therapy in patients with heart failure. Several deep learning-based approaches have been developed to predict 3D LMA maps of LV myocardium from a stack of sparse 2D cardiac magnetic resonance imaging (MRIs). However, these models often loosely consider the geometric shape structure of the myocardium. This makes the reconstructed activation maps suboptimal; hence leading to a reduced accuracy of predicting the late activating regions of hearts. In this paper, we propose to use shape-constrained diffusion models to better reconstruct a 3D LMA map, given a limited number of 2D cardiac MRI slices. In contrast to previous methods that primarily rely on spatial correlations of image intensities for 3D reconstruction, our model leverages object shape as priors learned from the training data to guide the reconstruction process. To achieve this, we develop a joint learning network that simultaneously learns a mean shape under deformation models. Each reconstructed image is then considered as a deformed variant of the mean shape. To validate the performance of our model, we train and test the proposed framework on a publicly available mesh dataset of 3D myocardium and compare it with state-of-the-art deep learning-based reconstruction models. Experimental results show that our model achieves superior performance in reconstructing the 3D LMA maps as compared to the state-of-the-art models. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/jayakumar23a.html https://proceedings.mlr.press/v225/jayakumar23a.html REMEDI: REinforcement learning-driven adaptive MEtabolism modeling of primary sclerosing cholangitis DIsease progression Primary sclerosing cholangitis (PSC) is a rare disease wherein altered bile acid metabolism contributes to sustained liver injury. This paper introduces REMEDI, a framework that captures bile acid dynamics and the body’s adaptive response during PSC progression that can assist in exploring treatments. REMEDI merges a differential equation (DE)-based mechanistic model that describes bile acid metabolism with reinforcement learning (RL) to emulate the body’s adaptations to PSC continuously. An objective of adaptation is to maintain homeostasis by regulating enzymes involved in bile acid metabolism. These enzymes correspond to the parameters of the DEs. REMEDI leverages RL to approximate adaptations in PSC, treating homeostasis as a reward signal and the adjustment of the DE parameters as the corresponding actions. On real-world data, REMEDI generated bile acid dynamics and parameter adjustments consistent with published findings. Also, our results support discussions in the literature that early administration of drugs that suppress bile acid synthesis may be effective in PSC treatment. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/hu23a.html https://proceedings.mlr.press/v225/hu23a.html Machine Learning for Health (ML4H) 2023 Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/hegselmann23a.html https://proceedings.mlr.press/v225/hegselmann23a.html A Probabilistic Method to Predict Classifier Accuracy on Larger Datasets given Small Pilot Data Practitioners building classifiers often start with a smaller pilot dataset and plan to grow to larger data in the near future. Such projects need a toolkit for extrapolating how much classifier accuracy may improve from a 2x, 10x, or 50x increase in data size. While existing work has focused on finding a single “best-fit” curve using various functional forms like power laws, we argue that modeling and assessing the uncertainty of predictions is critical yet has seen less attention. In this paper, we propose a Gaussian process model to obtain probabilistic extrapolations of accuracy or similar performance metrics as dataset size increases. We evaluate our approach in terms of error, likelihood, and coverage across six datasets. Though we focus on medical tasks and image modalities, our open source approach generalizes to any kind of classifier. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/harvey23a.html https://proceedings.mlr.press/v225/harvey23a.html Towards Reliable Dermatology Evaluation Benchmarks Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data-cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the isic . Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. https://github.com/Digital-Dermatology/SelfClean-Revised-Benchmarks Our work paves the way for more trustworthy performance assessment in digital dermatology. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/groger23a.html https://proceedings.mlr.press/v225/groger23a.html LLMs Accelerate Annotation for Medical Information Extraction The unstructured nature of clinical notes within electronic health records often conceals vital patient-related information, making it challenging to access or interpret. To uncover this hidden information, specialized Natural Language Processing (NLP) models are required. However, training these models necessitates large amounts of labeled data, a process that is both time-consuming and costly when relying solely on human experts for annotation. In this paper, we propose an approach that combines Large Language Models (LLMs) with human expertise to create an efficient method for generating ground truth labels for medical text annotation. By utilizing LLMs in conjunction with human annotators, we significantly reduce the human annotation burden, enabling the rapid creation of labeled datasets. We rigorously evaluate our method on a medical information extraction task, demonstrating that our approach not only substantially cuts down on human intervention but also maintains high accuracy. The results highlight the potential of using LLMs to improve the utilization of unstructured clinical data, allowing for the swift deployment of tailored NLP solutions in healthcare. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/goel23a.html https://proceedings.mlr.press/v225/goel23a.html Multi-modal Graph Learning over UMLS Knowledge Graphs Clinicians are increasingly looking towards machine learning to gain insights about patient progression. We propose a novel approach named Multi-Modal UMLS Graph Learning (MMUGL) for learning meaningful representations of medical concepts using graph neural networks over knowledge graphs based on the unified medical language system. These concept representations are aggregated to represent a patient visit and then fed into a sequence model to perform predictions at the granularity of multiple hospital visits of a patient. We improve performance by incorporating prior medical knowledge and considering multiple modalities. We compare our method to existing architectures proposed to learn representations at different granularities on the MIMIC-III dataset and show that our approach outperforms these methods. The results demonstrate the significance of multi-modal medical concept representations based on prior medical knowledge. We provide our code on GitHub https://github.com/ratschlab/mmugl . Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/burger23a.html https://proceedings.mlr.press/v225/burger23a.html Learning Temporal Higher-order Patterns to Detect Anomalous Brain Activity Due to recent advances in machine learning on graphs, representing the connections of the human brain as a network has become one of the most pervasive analytical paradigms. However, most existing graph machine learning-based methods suffer from a subset of five critical limitations: They are (1) designed for simple pair-wise interactions while recent studies on the human brain show the existence of higher-order dependencies of brain regions, (2) designed to perform on pre-constructed networks from time-series data, which limits their generalizability, (3) designed for classifying brain networks, limiting their ability to reveal underlying patterns that might cause the symptoms of a disease or disorder, (4) designed for learning of static patterns, missing the dynamics of human brain activity, and (5) designed in supervised setting, relying their performance on the existence of labeled data. To address these limitations, we present , an end-to-end anomaly detection model that automatically learns the structure of the hypergraph representation of the brain from neuroimage data. uses a tetra-stage message-passing mechanism along with an attention mechanism that learns the importance of higher-order dependencies of brain regions. We further present a new adaptive hypergraph pooling to obtain brain-level representation, enabling to detect the neuroimage of people living with a specific disease or disorder. Our experiments on Parkinson{’}s Disease, Attention Deficit Hyperactivity Disorder, and Autism Spectrum Disorder show the efficiency and effectiveness of our approaches in detecting anomalous brain activity. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/behrouz23a.html https://proceedings.mlr.press/v225/behrouz23a.html Representing visual classification as a linear combination of words Explainability is a longstanding challenge in deep learning, especially in high-stakes domains like healthcare. Common explainability methods highlight image regions that drive an AI model{’}s decision. Humans, however, heavily rely on language to convey explanations of not only “where” but {“}what{”}. Additionally, most explainability approaches focus on explaining individual AI predictions, rather than describing the features used by an AI model in general. The latter would be especially useful for model and dataset auditing, and potentially even knowledge generation as AI is increasingly being used in novel tasks. Here, we present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task. By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words, resulting in a weight for each word that indicates its alignment with the vision-based classifier. We assess our approach using two medical imaging classification tasks, where we find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training. However, our approach also identifies the potential for {‘}shortcut connections{’} in the public datasets used. Towards a functional measure of explainability, we perform a pilot reader study where we find that the AI-identified words can enable non-expert humans to perform a specialized medical task at a non-trivial level. Altogether, our results emphasize the potential of using multimodal foundational models to deliver intuitive, language-based explanations of visual tasks. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/agarwal23a.html https://proceedings.mlr.press/v225/agarwal23a.html Towards Equitable Kidney Tumor Segmentation: Bias Evaluation and Mitigation Kidney tumors, affecting over 400,000 individuals annually, require accurate segmentation for effective treatment and surgical planning. Yet, manual segmentation is time-consuming, steering the medical community towards automated methods. While computer-aided diagnostic tools promise improvements, their transition into the real world mandates an understanding of their performance across diverse population subgroups. Our study is the first to investigate fairness concerning kidney and tumor segmentation, particularly focusing on sensitive attributes like sex and age. Our findings show an existence of bias in performance across both attributes. In particular, despite a male-dominated training dataset, females showed superior segmentation performance. Age groups 60-70 and above 70 also deviated significantly from the average performance for all ages. To address these biases, we comprehensively explore bias mitigation strategies - encompassing pre-processing techniques (Resampling Algorithm and Stratified Batch Sampling) and in-processing methods (Fair Meta-learning and architectural adjustments). Specifically, Attention U-Net was identified as the optimal model for balancing fairness across both attributes while maintaining high segmentation performance. We present a crucial insight that the architecture itself could be a source of inherent biases, and careful selection of the network design can inherently reduce these biases. Our assessment of UNet variants challenges the prevailing paradigm of model selection predicated solely on segmentation performance, especially considering the profound implications biases can have in clinical outcomes. Mon, 04 Dec 2023 00:00:00 +0000 https://proceedings.mlr.press/v225/afzal23a.html https://proceedings.mlr.press/v225/afzal23a.html