An hybrid CNN-Transformer model based on multi-feature extraction and attention fusion mechanism for cerebral emboli classification
Proceedings of the 7th Machine Learning for Healthcare Conference, PMLR 182:270-296, 2022.
When dealing with signal processing and deep learning for classification, the choice of inputting whether the raw signal or transforming it into a time-frequency representation (TFR) remains an open question. In this work, we propose a novel CNN-Transformer model based on multi-feature extraction and learnable representation attention weights per class to do classification with raw signals and TFRs. First, we start by extracting a TFR from the raw signal. Then, we train two models to extract intermediate representations from the raw signals and the TFRs. We use a CNN-Transformer model to process the raw signal and a 2D CNN for the TFR. Finally, we train a classifier that combines the outputs of both models (late fusion) using learnable and interpretable attention weights per class. We evaluate our approach on three medical datasets: a cerebral emboli dataset (HITS), and two electrocardiogram datasets, PTB and MIT-BIH, for heartbeat categorization. The results show that our multi-feature fusion approach improves the classification performance with respect to the use of a single feature method or other multi-feature fusion methods. Furthermore, it achieves state-of-the-art results on the HITS and PTB datasets with a classification accuracy of 93, 4% and 99, 7%, respectively. It also achieves excellent performance on the MIT-BIH dataset, with an accuracy of 98, 4% and a lighter model than other state-of-the-art methods. What is more, our fusion method provides interpretable attention weights per class indicating the importance of each representation for the final decision of the classifier.