ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice

Stefan Kraft; Andreas Theissler; Dr. Vera Wienhausen-Wilke; Philipp Walter; Gjergji Kasneci; Hendrik Lensch

ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice

Stefan Kraft, Andreas Theissler, Dr. Vera Wienhausen-Wilke, Philipp Walter, Gjergji Kasneci, Hendrik Lensch

Proceedings of the sixth Conference on Health, Inference, and Learning, PMLR 287:395-429, 2025.

Abstract

Detecting arousals during sleep is crucial for diagnosing sleep disorders, yet the adoption of Machine Learning (ML) in clinical practice is hindered by a mismatch between clinical protocols and ML methods. Clinicians typically annotate only arousal onsets, whereas ML approaches conventionally rely on annotations for both the beginning and end. Moreover, no standardized evaluation methodology exists that is tailored to the specific needs of arousal detection in clinical practice. We address these challenges by proposing a novel post-processing and evaluation framework - Approximate Localization and Precise Event Count (ALPEC) - which optimizes arousal detectors to reflect operational priorities. We further advocate focusing on arousal onset detection and assess the impact of this on current training and evaluation schemes, addressing associated simplifications and challenges. Finally, we introduce a novel polysomnographic dataset that reflects the aforementioned clinical annotation constraints and includes modalities absent from existing datasets, demonstrating the benefits of leveraging multimodal data for arousal onset detection. Our contributions significantly advance the integration of ML-based arousal detection into clinical settings, narrowing the gap between technological advancements and clinical requirements.

Cite this Paper

BibTeX

@InProceedings{pmlr-v287-kraft25a,
  title = 	 {ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice},
  author =       {Kraft, Stefan and Theissler, Andreas and Wienhausen-Wilke, Dr. Vera and Walter, Philipp and Kasneci, Gjergji and Lensch, Hendrik},
  booktitle = 	 {Proceedings of the sixth Conference on Health, Inference, and Learning},
  pages = 	 {395--429},
  year = 	 {2025},
  editor = 	 {Xu, Xuhai Orson and Choi, Edward and Singhal, Pankhuri and Gerych, Walter and Tang, Shengpu and Agrawal, Monica and Subbaswamy, Adarsh and Sizikova, Elena and Dunn, Jessilyn and Daneshjou, Roxana and Sarker, Tasmie and McDermott, Matthew and Chen, Irene},
  volume = 	 {287},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v287/main/assets/kraft25a/kraft25a.pdf},
  url = 	 {https://proceedings.mlr.press/v287/kraft25a.html},
  abstract = 	 {Detecting arousals during sleep is crucial for diagnosing sleep disorders, yet the adoption of Machine Learning (ML) in clinical practice is hindered by a mismatch between clinical protocols and ML methods. Clinicians typically annotate only arousal onsets, whereas ML approaches conventionally rely on annotations for both the beginning and end. Moreover, no standardized evaluation methodology exists that is tailored to the specific needs of arousal detection in clinical practice. We address these challenges by proposing a novel post-processing and evaluation framework - Approximate Localization and Precise Event Count (ALPEC) - which optimizes arousal detectors to reflect operational priorities. We further advocate focusing on arousal onset detection and assess the impact of this on current training and evaluation schemes, addressing associated simplifications and challenges. Finally, we introduce a novel polysomnographic dataset that reflects the aforementioned clinical annotation constraints and includes modalities absent from existing datasets, demonstrating the benefits of leveraging multimodal data for arousal onset detection. Our contributions significantly advance the integration of ML-based arousal detection into clinical settings, narrowing the gap between technological advancements and clinical requirements.}
}

Endnote

%0 Conference Paper
%T ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice
%A Stefan Kraft
%A Andreas Theissler
%A Dr. Vera Wienhausen-Wilke
%A Philipp Walter
%A Gjergji Kasneci
%A Hendrik Lensch
%B Proceedings of the sixth Conference on Health, Inference, and Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Xuhai Orson Xu
%E Edward Choi
%E Pankhuri Singhal
%E Walter Gerych
%E Shengpu Tang
%E Monica Agrawal
%E Adarsh Subbaswamy
%E Elena Sizikova
%E Jessilyn Dunn
%E Roxana Daneshjou
%E Tasmie Sarker
%E Matthew McDermott
%E Irene Chen	
%F pmlr-v287-kraft25a
%I PMLR
%P 395--429
%U https://proceedings.mlr.press/v287/kraft25a.html
%V 287
%X Detecting arousals during sleep is crucial for diagnosing sleep disorders, yet the adoption of Machine Learning (ML) in clinical practice is hindered by a mismatch between clinical protocols and ML methods. Clinicians typically annotate only arousal onsets, whereas ML approaches conventionally rely on annotations for both the beginning and end. Moreover, no standardized evaluation methodology exists that is tailored to the specific needs of arousal detection in clinical practice. We address these challenges by proposing a novel post-processing and evaluation framework - Approximate Localization and Precise Event Count (ALPEC) - which optimizes arousal detectors to reflect operational priorities. We further advocate focusing on arousal onset detection and assess the impact of this on current training and evaluation schemes, addressing associated simplifications and challenges. Finally, we introduce a novel polysomnographic dataset that reflects the aforementioned clinical annotation constraints and includes modalities absent from existing datasets, demonstrating the benefits of leveraging multimodal data for arousal onset detection. Our contributions significantly advance the integration of ML-based arousal detection into clinical settings, narrowing the gap between technological advancements and clinical requirements.

APA

Kraft, S., Theissler, A., Wienhausen-Wilke, D.V., Walter, P., Kasneci, G. & Lensch, H.. (2025). ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice. Proceedings of the sixth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 287:395-429 Available from https://proceedings.mlr.press/v287/kraft25a.html.

Related Material

Download PDF