MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection

Xu Cao; Wenqian Ye; Kenny Moise; Megan Coffee

MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection

Xu Cao, Wenqian Ye, Kenny Moise, Megan Coffee

Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:171-185, 2025.

Abstract

In the aftermath of the COVID-19 pandemic and amid accelerating climate change, emerging infectious diseases, particularly those arising from zoonotic spillover, remain a global threat. Mpox (caused by the monkeypox virus) is a notable example of a zoonotic infection that often goes undiagnosed, especially as its rash progresses through stages, complicating detection across diverse populations with different presentations. In August 2024, the WHO Director-General declared the mpox outbreak a public health emergency of international concern for a second time. Despite the deployment of deep learning techniques for detecting diseases from skin lesion images, a robust and publicly accessible foundation model for mpox diagnosis is still lacking due to the unavailability of open-source mpox skin lesion images, multimodal clinical data, and specialized training pipelines. To address this gap, we propose MpoxVLM, a vision-language model (VLM) designed to detect mpox by analyzing both skin lesion images and patient clinical information. MpoxVLM integrates the CLIP visual encoder, an enhanced Vision Transformer (ViT) classifier for skin lesions, and LLaMA-2-7B models, pre-trained and fine-tuned on visual instruction-following question-answer pairs from our newly released mpox skin lesion dataset. Our work achieves 90.38% accuracy for mpox detection, offering a promising pathway to improve early diagnostic accuracy in combating mpox.

Cite this Paper

BibTeX

@InProceedings{pmlr-v259-cao25a,
  title = 	 {MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection},
  author =       {Cao, Xu and Ye, Wenqian and Moise, Kenny and Coffee, Megan},
  booktitle = 	 {Proceedings of the 4th Machine Learning for Health Symposium},
  pages = 	 {171--185},
  year = 	 {2025},
  editor = 	 {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran},
  volume = 	 {259},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--16 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v259/main/assets/cao25a/cao25a.pdf},
  url = 	 {https://proceedings.mlr.press/v259/cao25a.html},
  abstract = 	 {In the aftermath of the COVID-19 pandemic and amid accelerating climate change, emerging infectious diseases, particularly those arising from zoonotic spillover, remain a global threat. Mpox (caused by the monkeypox virus) is a notable example of a zoonotic infection that often goes undiagnosed, especially as its rash progresses through stages, complicating detection across diverse populations with different presentations. In August 2024, the WHO Director-General declared the mpox outbreak a public health emergency of international concern for a second time. Despite the deployment of deep learning techniques for detecting diseases from skin lesion images, a robust and publicly accessible foundation model for mpox diagnosis is still lacking due to the unavailability of open-source mpox skin lesion images, multimodal clinical data, and specialized training pipelines. To address this gap, we propose MpoxVLM, a vision-language model (VLM) designed to detect mpox by analyzing both skin lesion images and patient clinical information. MpoxVLM integrates the CLIP visual encoder, an enhanced Vision Transformer (ViT) classifier for skin lesions, and LLaMA-2-7B models, pre-trained and fine-tuned on visual instruction-following question-answer pairs from our newly released mpox skin lesion dataset. Our work achieves 90.38% accuracy for mpox detection, offering a promising pathway to improve early diagnostic accuracy in combating mpox.}
}

Endnote

%0 Conference Paper
%T MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection
%A Xu Cao
%A Wenqian Ye
%A Kenny Moise
%A Megan Coffee
%B Proceedings of the 4th Machine Learning for Health Symposium
%C Proceedings of Machine Learning Research
%D 2025
%E Stefan Hegselmann
%E Helen Zhou
%E Elizabeth Healey
%E Trenton Chang
%E Caleb Ellington
%E Vishwali Mhasawade
%E Sana Tonekaboni
%E Peniel Argaw
%E Haoran Zhang	
%F pmlr-v259-cao25a
%I PMLR
%P 171--185
%U https://proceedings.mlr.press/v259/cao25a.html
%V 259
%X In the aftermath of the COVID-19 pandemic and amid accelerating climate change, emerging infectious diseases, particularly those arising from zoonotic spillover, remain a global threat. Mpox (caused by the monkeypox virus) is a notable example of a zoonotic infection that often goes undiagnosed, especially as its rash progresses through stages, complicating detection across diverse populations with different presentations. In August 2024, the WHO Director-General declared the mpox outbreak a public health emergency of international concern for a second time. Despite the deployment of deep learning techniques for detecting diseases from skin lesion images, a robust and publicly accessible foundation model for mpox diagnosis is still lacking due to the unavailability of open-source mpox skin lesion images, multimodal clinical data, and specialized training pipelines. To address this gap, we propose MpoxVLM, a vision-language model (VLM) designed to detect mpox by analyzing both skin lesion images and patient clinical information. MpoxVLM integrates the CLIP visual encoder, an enhanced Vision Transformer (ViT) classifier for skin lesions, and LLaMA-2-7B models, pre-trained and fine-tuned on visual instruction-following question-answer pairs from our newly released mpox skin lesion dataset. Our work achieves 90.38% accuracy for mpox detection, offering a promising pathway to improve early diagnostic accuracy in combating mpox.

APA

Cao, X., Ye, W., Moise, K. & Coffee, M.. (2025). MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:171-185 Available from https://proceedings.mlr.press/v259/cao25a.html.

MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection

Abstract

Cite this Paper

Related Material