Med-Flamingo: a Multimodal Medical Few-shot Learner

Michael Moor, Qian Huang, Shirley Wu, Michihiro Yasunaga, Yash Dalmia, Jure Leskovec, Cyril Zakka, Eduardo Pontes Reis, Pranav Rajpurkar
Proceedings of the 3rd Machine Learning for Health Symposium, PMLR 225:353-367, 2023.

Abstract

Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models{~}(VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering{~}(VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20 {\}% in clinician’s rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app. %under{~}{\}url\{https://github.com/snap-stanford/med-flamingo\}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v225-moor23a, title = {Med-Flamingo: a Multimodal Medical Few-shot Learner}, author = {Moor, Michael and Huang, Qian and Wu, Shirley and Yasunaga, Michihiro and Dalmia, Yash and Leskovec, Jure and Zakka, Cyril and Reis, Eduardo Pontes and Rajpurkar, Pranav}, booktitle = {Proceedings of the 3rd Machine Learning for Health Symposium}, pages = {353--367}, year = {2023}, editor = {Hegselmann, Stefan and Parziale, Antonio and Shanmugam, Divya and Tang, Shengpu and Asiedu, Mercy Nyamewaa and Chang, Serina and Hartvigsen, Tom and Singh, Harvineet}, volume = {225}, series = {Proceedings of Machine Learning Research}, month = {10 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v225/moor23a/moor23a.pdf}, url = {https://proceedings.mlr.press/v225/moor23a.html}, abstract = {Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models{~}(VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering{~}(VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20 {\}% in clinician’s rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app. %under{~}{\}url\{https://github.com/snap-stanford/med-flamingo\}.} }
Endnote
%0 Conference Paper %T Med-Flamingo: a Multimodal Medical Few-shot Learner %A Michael Moor %A Qian Huang %A Shirley Wu %A Michihiro Yasunaga %A Yash Dalmia %A Jure Leskovec %A Cyril Zakka %A Eduardo Pontes Reis %A Pranav Rajpurkar %B Proceedings of the 3rd Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2023 %E Stefan Hegselmann %E Antonio Parziale %E Divya Shanmugam %E Shengpu Tang %E Mercy Nyamewaa Asiedu %E Serina Chang %E Tom Hartvigsen %E Harvineet Singh %F pmlr-v225-moor23a %I PMLR %P 353--367 %U https://proceedings.mlr.press/v225/moor23a.html %V 225 %X Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models{~}(VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering{~}(VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20 {\}% in clinician’s rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app. %under{~}{\}url\{https://github.com/snap-stanford/med-flamingo\}.
APA
Moor, M., Huang, Q., Wu, S., Yasunaga, M., Dalmia, Y., Leskovec, J., Zakka, C., Reis, E.P. & Rajpurkar, P.. (2023). Med-Flamingo: a Multimodal Medical Few-shot Learner. Proceedings of the 3rd Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 225:353-367 Available from https://proceedings.mlr.press/v225/moor23a.html.

Related Material