MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata

Yihao Liu, Chenyu Gao, Lianrui Zuo, Michael E. Kim, Brian D. Boyd, Lisa L. Barnes, Walter A. Kukull, Lori L. Beason-Held, Susan M. Resnick, Timothy J. Hohman, Warren D. Taylor, Bennett A. Landman
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:468-487, 2026.

Abstract

Modern deep learning methods have achieved impressive results across tasks from disease classification, estimating continuous biomarkers, to generating realistic medical images. Most of these approaches are trained to model conditional distributions defined by a specific predictive direction with a specific set of input variables. We introduce MetaVoxel, a generative joint diffusion modeling framework that models the joint distribution over imaging data and clinical metadata by learning a single diffusion process spanning all variables. By capturing the joint distribution, MetaVoxel unifies tasks that traditionally require separate conditional models and supports flexible zero-shot inference using arbitrary subsets of inputs without task-specific retraining. Using more than $10,000$ T1-weighted MRI scans paired with clinical metadata from nine datasets, we show that a single MetaVoxel model can perform image generation, age estimation, and sex prediction, achieving performance comparable to established task-specific baselines. Additional experiments highlight its capabilities for flexible inference. Together, these findings demonstrate that joint multimodal diffusion offers a promising direction for unifying medical AI models and enabling broader clinical applicability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-liu26a, title = {MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata}, author = {Liu, Yihao and Gao, Chenyu and Zuo, Lianrui and Kim, Michael E. and Boyd, Brian D. and Barnes, Lisa L. and Kukull, Walter A. and Beason-Held, Lori L. and Resnick, Susan M. and Hohman, Timothy J. and Taylor, Warren D. and Landman, Bennett A.}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {468--487}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/liu26a/liu26a.pdf}, url = {https://proceedings.mlr.press/v315/liu26a.html}, abstract = {Modern deep learning methods have achieved impressive results across tasks from disease classification, estimating continuous biomarkers, to generating realistic medical images. Most of these approaches are trained to model conditional distributions defined by a specific predictive direction with a specific set of input variables. We introduce MetaVoxel, a generative joint diffusion modeling framework that models the joint distribution over imaging data and clinical metadata by learning a single diffusion process spanning all variables. By capturing the joint distribution, MetaVoxel unifies tasks that traditionally require separate conditional models and supports flexible zero-shot inference using arbitrary subsets of inputs without task-specific retraining. Using more than $10,000$ T1-weighted MRI scans paired with clinical metadata from nine datasets, we show that a single MetaVoxel model can perform image generation, age estimation, and sex prediction, achieving performance comparable to established task-specific baselines. Additional experiments highlight its capabilities for flexible inference. Together, these findings demonstrate that joint multimodal diffusion offers a promising direction for unifying medical AI models and enabling broader clinical applicability.} }
Endnote
%0 Conference Paper %T MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata %A Yihao Liu %A Chenyu Gao %A Lianrui Zuo %A Michael E. Kim %A Brian D. Boyd %A Lisa L. Barnes %A Walter A. Kukull %A Lori L. Beason-Held %A Susan M. Resnick %A Timothy J. Hohman %A Warren D. Taylor %A Bennett A. Landman %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-liu26a %I PMLR %P 468--487 %U https://proceedings.mlr.press/v315/liu26a.html %V 315 %X Modern deep learning methods have achieved impressive results across tasks from disease classification, estimating continuous biomarkers, to generating realistic medical images. Most of these approaches are trained to model conditional distributions defined by a specific predictive direction with a specific set of input variables. We introduce MetaVoxel, a generative joint diffusion modeling framework that models the joint distribution over imaging data and clinical metadata by learning a single diffusion process spanning all variables. By capturing the joint distribution, MetaVoxel unifies tasks that traditionally require separate conditional models and supports flexible zero-shot inference using arbitrary subsets of inputs without task-specific retraining. Using more than $10,000$ T1-weighted MRI scans paired with clinical metadata from nine datasets, we show that a single MetaVoxel model can perform image generation, age estimation, and sex prediction, achieving performance comparable to established task-specific baselines. Additional experiments highlight its capabilities for flexible inference. Together, these findings demonstrate that joint multimodal diffusion offers a promising direction for unifying medical AI models and enabling broader clinical applicability.
APA
Liu, Y., Gao, C., Zuo, L., Kim, M.E., Boyd, B.D., Barnes, L.L., Kukull, W.A., Beason-Held, L.L., Resnick, S.M., Hohman, T.J., Taylor, W.D. & Landman, B.A.. (2026). MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:468-487 Available from https://proceedings.mlr.press/v315/liu26a.html.

Related Material