Data-Driven Discovery of Feature Groups in Clinical Time Series

Fedor Sergeev, Manuel Burger, Polina Leshetkina, Vincent Fortuin, Gunnar Rätsch, Rita Kuznetsova
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:167-201, 2026.

Abstract

Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-sergeev26a, title = {Data-Driven Discovery of Feature Groups in Clinical Time Series}, author = {Sergeev, Fedor and Burger, Manuel and Leshetkina, Polina and Fortuin, Vincent and R{\"a}tsch, Gunnar and Kuznetsova, Rita}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {167--201}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/sergeev26a/sergeev26a.pdf}, url = {https://proceedings.mlr.press/v297/sergeev26a.html}, abstract = {Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.} }
Endnote
%0 Conference Paper %T Data-Driven Discovery of Feature Groups in Clinical Time Series %A Fedor Sergeev %A Manuel Burger %A Polina Leshetkina %A Vincent Fortuin %A Gunnar Rätsch %A Rita Kuznetsova %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-sergeev26a %I PMLR %P 167--201 %U https://proceedings.mlr.press/v297/sergeev26a.html %V 297 %X Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.
APA
Sergeev, F., Burger, M., Leshetkina, P., Fortuin, V., Rätsch, G. & Kuznetsova, R.. (2026). Data-Driven Discovery of Feature Groups in Clinical Time Series. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:167-201 Available from https://proceedings.mlr.press/v297/sergeev26a.html.

Related Material