Tuning In: Comparative Analysis of Audio Classifier Performance in Clinical Settings with Limited Data

Hamza Mahdi, Eptehal Nashnoush, Rami Saab, Arjun Balachandar, Rishit Dagli, Lucas Perri, Houman Khosravani
Proceedings of the fifth Conference on Health, Inference, and Learning, PMLR 248:446-460, 2024.

Abstract

This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting the prospective collection of real-world data. We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, and SWIN, and compare them against pretrained audio models such as AST, YAMNet and VGGish. Our method highlights the benefits of pretraining on large datasets before fine-tuning on specific clinical data. We prospectively collected two first-of-its-kind patient audio datasets from stroke patients. We investigated various preprocessing techniques, finding that RGB and grayscale spectrogram transformations affect model performance differently based on the priors they learn from pretraining. Our findings indicate CNNs can match or exceed transformer models in small dataset contexts, with DenseNet-Contrastive and AST models showing notable performance. This study highlights the significance of incremental marginal gains through model selection, pretraining, and preprocessing in sound classification; this offers valuable insights for clinical diagnostics that rely on audio classification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v248-mahdi24a, title = {Tuning In: Comparative Analysis of Audio Classifier Performance in Clinical Settings with Limited Data}, author = {Mahdi, Hamza and Nashnoush, Eptehal and Saab, Rami and Balachandar, Arjun and Dagli, Rishit and Perri, Lucas and Khosravani, Houman}, booktitle = {Proceedings of the fifth Conference on Health, Inference, and Learning}, pages = {446--460}, year = {2024}, editor = {Pollard, Tom and Choi, Edward and Singhal, Pankhuri and Hughes, Michael and Sizikova, Elena and Mortazavi, Bobak and Chen, Irene and Wang, Fei and Sarker, Tasmie and McDermott, Matthew and Ghassemi, Marzyeh}, volume = {248}, series = {Proceedings of Machine Learning Research}, month = {27--28 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v248/main/assets/mahdi24a/mahdi24a.pdf}, url = {https://proceedings.mlr.press/v248/mahdi24a.html}, abstract = {This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting the prospective collection of real-world data. We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, and SWIN, and compare them against pretrained audio models such as AST, YAMNet and VGGish. Our method highlights the benefits of pretraining on large datasets before fine-tuning on specific clinical data. We prospectively collected two first-of-its-kind patient audio datasets from stroke patients. We investigated various preprocessing techniques, finding that RGB and grayscale spectrogram transformations affect model performance differently based on the priors they learn from pretraining. Our findings indicate CNNs can match or exceed transformer models in small dataset contexts, with DenseNet-Contrastive and AST models showing notable performance. This study highlights the significance of incremental marginal gains through model selection, pretraining, and preprocessing in sound classification; this offers valuable insights for clinical diagnostics that rely on audio classification.} }
Endnote
%0 Conference Paper %T Tuning In: Comparative Analysis of Audio Classifier Performance in Clinical Settings with Limited Data %A Hamza Mahdi %A Eptehal Nashnoush %A Rami Saab %A Arjun Balachandar %A Rishit Dagli %A Lucas Perri %A Houman Khosravani %B Proceedings of the fifth Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2024 %E Tom Pollard %E Edward Choi %E Pankhuri Singhal %E Michael Hughes %E Elena Sizikova %E Bobak Mortazavi %E Irene Chen %E Fei Wang %E Tasmie Sarker %E Matthew McDermott %E Marzyeh Ghassemi %F pmlr-v248-mahdi24a %I PMLR %P 446--460 %U https://proceedings.mlr.press/v248/mahdi24a.html %V 248 %X This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting the prospective collection of real-world data. We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, and SWIN, and compare them against pretrained audio models such as AST, YAMNet and VGGish. Our method highlights the benefits of pretraining on large datasets before fine-tuning on specific clinical data. We prospectively collected two first-of-its-kind patient audio datasets from stroke patients. We investigated various preprocessing techniques, finding that RGB and grayscale spectrogram transformations affect model performance differently based on the priors they learn from pretraining. Our findings indicate CNNs can match or exceed transformer models in small dataset contexts, with DenseNet-Contrastive and AST models showing notable performance. This study highlights the significance of incremental marginal gains through model selection, pretraining, and preprocessing in sound classification; this offers valuable insights for clinical diagnostics that rely on audio classification.
APA
Mahdi, H., Nashnoush, E., Saab, R., Balachandar, A., Dagli, R., Perri, L. & Khosravani, H.. (2024). Tuning In: Comparative Analysis of Audio Classifier Performance in Clinical Settings with Limited Data. Proceedings of the fifth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 248:446-460 Available from https://proceedings.mlr.press/v248/mahdi24a.html.

Related Material