An hybrid CNN-Transformer model based on multi-feature extraction and attention fusion mechanism for cerebral emboli classification

Yamil Vindas, Blaise Kevin Guepie, Marilys Almar, Emmanuel Roux, Philippe Delachartre
Proceedings of the 7th Machine Learning for Healthcare Conference, PMLR 182:270-296, 2022.

Abstract

When dealing with signal processing and deep learning for classification, the choice of inputting whether the raw signal or transforming it into a time-frequency representation (TFR) remains an open question. In this work, we propose a novel CNN-Transformer model based on multi-feature extraction and learnable representation attention weights per class to do classification with raw signals and TFRs. First, we start by extracting a TFR from the raw signal. Then, we train two models to extract intermediate representations from the raw signals and the TFRs. We use a CNN-Transformer model to process the raw signal and a 2D CNN for the TFR. Finally, we train a classifier that combines the outputs of both models (late fusion) using learnable and interpretable attention weights per class. We evaluate our approach on three medical datasets: a cerebral emboli dataset (HITS), and two electrocardiogram datasets, PTB and MIT-BIH, for heartbeat categorization. The results show that our multi-feature fusion approach improves the classification performance with respect to the use of a single feature method or other multi-feature fusion methods. Furthermore, it achieves state-of-the-art results on the HITS and PTB datasets with a classification accuracy of 93, 4% and 99, 7%, respectively. It also achieves excellent performance on the MIT-BIH dataset, with an accuracy of 98, 4% and a lighter model than other state-of-the-art methods. What is more, our fusion method provides interpretable attention weights per class indicating the importance of each representation for the final decision of the classifier.

Cite this Paper


BibTeX
@InProceedings{pmlr-v182-vindas22a, title = {An hybrid CNN-Transformer model based on multi-feature extraction and attention fusion mechanism for cerebral emboli classification}, author = {Vindas, Yamil and Guepie, Blaise Kevin and Almar, Marilys and Roux, Emmanuel and Delachartre, Philippe}, booktitle = {Proceedings of the 7th Machine Learning for Healthcare Conference}, pages = {270--296}, year = {2022}, editor = {Lipton, Zachary and Ranganath, Rajesh and Sendak, Mark and Sjoding, Michael and Yeung, Serena}, volume = {182}, series = {Proceedings of Machine Learning Research}, month = {05--06 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v182/vindas22a/vindas22a.pdf}, url = {https://proceedings.mlr.press/v182/vindas22a.html}, abstract = {When dealing with signal processing and deep learning for classification, the choice of inputting whether the raw signal or transforming it into a time-frequency representation (TFR) remains an open question. In this work, we propose a novel CNN-Transformer model based on multi-feature extraction and learnable representation attention weights per class to do classification with raw signals and TFRs. First, we start by extracting a TFR from the raw signal. Then, we train two models to extract intermediate representations from the raw signals and the TFRs. We use a CNN-Transformer model to process the raw signal and a 2D CNN for the TFR. Finally, we train a classifier that combines the outputs of both models (late fusion) using learnable and interpretable attention weights per class. We evaluate our approach on three medical datasets: a cerebral emboli dataset (HITS), and two electrocardiogram datasets, PTB and MIT-BIH, for heartbeat categorization. The results show that our multi-feature fusion approach improves the classification performance with respect to the use of a single feature method or other multi-feature fusion methods. Furthermore, it achieves state-of-the-art results on the HITS and PTB datasets with a classification accuracy of 93, 4% and 99, 7%, respectively. It also achieves excellent performance on the MIT-BIH dataset, with an accuracy of 98, 4% and a lighter model than other state-of-the-art methods. What is more, our fusion method provides interpretable attention weights per class indicating the importance of each representation for the final decision of the classifier.} }
Endnote
%0 Conference Paper %T An hybrid CNN-Transformer model based on multi-feature extraction and attention fusion mechanism for cerebral emboli classification %A Yamil Vindas %A Blaise Kevin Guepie %A Marilys Almar %A Emmanuel Roux %A Philippe Delachartre %B Proceedings of the 7th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2022 %E Zachary Lipton %E Rajesh Ranganath %E Mark Sendak %E Michael Sjoding %E Serena Yeung %F pmlr-v182-vindas22a %I PMLR %P 270--296 %U https://proceedings.mlr.press/v182/vindas22a.html %V 182 %X When dealing with signal processing and deep learning for classification, the choice of inputting whether the raw signal or transforming it into a time-frequency representation (TFR) remains an open question. In this work, we propose a novel CNN-Transformer model based on multi-feature extraction and learnable representation attention weights per class to do classification with raw signals and TFRs. First, we start by extracting a TFR from the raw signal. Then, we train two models to extract intermediate representations from the raw signals and the TFRs. We use a CNN-Transformer model to process the raw signal and a 2D CNN for the TFR. Finally, we train a classifier that combines the outputs of both models (late fusion) using learnable and interpretable attention weights per class. We evaluate our approach on three medical datasets: a cerebral emboli dataset (HITS), and two electrocardiogram datasets, PTB and MIT-BIH, for heartbeat categorization. The results show that our multi-feature fusion approach improves the classification performance with respect to the use of a single feature method or other multi-feature fusion methods. Furthermore, it achieves state-of-the-art results on the HITS and PTB datasets with a classification accuracy of 93, 4% and 99, 7%, respectively. It also achieves excellent performance on the MIT-BIH dataset, with an accuracy of 98, 4% and a lighter model than other state-of-the-art methods. What is more, our fusion method provides interpretable attention weights per class indicating the importance of each representation for the final decision of the classifier.
APA
Vindas, Y., Guepie, B.K., Almar, M., Roux, E. & Delachartre, P.. (2022). An hybrid CNN-Transformer model based on multi-feature extraction and attention fusion mechanism for cerebral emboli classification. Proceedings of the 7th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 182:270-296 Available from https://proceedings.mlr.press/v182/vindas22a.html.

Related Material