A Systematic Comparison of Data Representations for Transformer-Based ECG Arrhythmia Classification

Mona Aman, Godbright Uiso, Carine Mukamakuza, Vijayakumar Bhagavatula
Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 317:37-45, 2026.

Abstract

Automated electrocardiogram (ECG) classification plays a key role in detecting cardiac arrhythmias efficiently and objectively. Despite major advances in deep learning, there remains no consensus on whether one-dimensional (1D) temporal or two-dimensional (2D) time–frequency representations yield superior diagnostic accuracy. This study presents a controlled comparison between Vision Transformer (ViT) architectures trained on raw 1D ECG sequences and Short-Time Fourier Transform (STFT)-based 2D spectrograms using the CPSC2018 dataset. Both models share comparable architectures and parameter counts to isolate the effect of signal representation. The 1D-ViT achieved the highest overall accuracy (96.5%) and F1-score (96.5%), confirming that preserving temporal continuity is critical for arrhythmia discrimination. The 2D-ViT achieved lower accuracy (92.6%) due to temporal information loss, though it maintained competitive calibration (AUC 98.6%) and generalization. A bidirectional fusion model combining both encoders through cross-attention exhibited complementary behavior but did not surpass the 1D baseline. These findings indicate that while spectro-temporal information can enhance interpretability and stability, temporal-domain fidelity remains the dominant factor for reliable ECG classification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v317-aman26a, title = {A Systematic Comparison of Data Representations for Transformer-Based ECG Arrhythmia Classification}, author = {Aman, Mona and Uiso, Godbright and Mukamakuza, Carine and Bhagavatula, Vijayakumar}, booktitle = {Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare}, pages = {37--45}, year = {2026}, editor = {Wu, Junde and Pan, Jiazhen and Zhu, Jiayuan and Luo, Luyang and Li, Yitong and Xu, Min and Jin, Yueming and Rueckert, Daniel}, volume = {317}, series = {Proceedings of Machine Learning Research}, month = {20--21 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v317/main/assets/aman26a/aman26a.pdf}, url = {https://proceedings.mlr.press/v317/aman26a.html}, abstract = {Automated electrocardiogram (ECG) classification plays a key role in detecting cardiac arrhythmias efficiently and objectively. Despite major advances in deep learning, there remains no consensus on whether one-dimensional (1D) temporal or two-dimensional (2D) time–frequency representations yield superior diagnostic accuracy. This study presents a controlled comparison between Vision Transformer (ViT) architectures trained on raw 1D ECG sequences and Short-Time Fourier Transform (STFT)-based 2D spectrograms using the CPSC2018 dataset. Both models share comparable architectures and parameter counts to isolate the effect of signal representation. The 1D-ViT achieved the highest overall accuracy (96.5%) and F1-score (96.5%), confirming that preserving temporal continuity is critical for arrhythmia discrimination. The 2D-ViT achieved lower accuracy (92.6%) due to temporal information loss, though it maintained competitive calibration (AUC 98.6%) and generalization. A bidirectional fusion model combining both encoders through cross-attention exhibited complementary behavior but did not surpass the 1D baseline. These findings indicate that while spectro-temporal information can enhance interpretability and stability, temporal-domain fidelity remains the dominant factor for reliable ECG classification.} }
Endnote
%0 Conference Paper %T A Systematic Comparison of Data Representations for Transformer-Based ECG Arrhythmia Classification %A Mona Aman %A Godbright Uiso %A Carine Mukamakuza %A Vijayakumar Bhagavatula %B Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare %C Proceedings of Machine Learning Research %D 2026 %E Junde Wu %E Jiazhen Pan %E Jiayuan Zhu %E Luyang Luo %E Yitong Li %E Min Xu %E Yueming Jin %E Daniel Rueckert %F pmlr-v317-aman26a %I PMLR %P 37--45 %U https://proceedings.mlr.press/v317/aman26a.html %V 317 %X Automated electrocardiogram (ECG) classification plays a key role in detecting cardiac arrhythmias efficiently and objectively. Despite major advances in deep learning, there remains no consensus on whether one-dimensional (1D) temporal or two-dimensional (2D) time–frequency representations yield superior diagnostic accuracy. This study presents a controlled comparison between Vision Transformer (ViT) architectures trained on raw 1D ECG sequences and Short-Time Fourier Transform (STFT)-based 2D spectrograms using the CPSC2018 dataset. Both models share comparable architectures and parameter counts to isolate the effect of signal representation. The 1D-ViT achieved the highest overall accuracy (96.5%) and F1-score (96.5%), confirming that preserving temporal continuity is critical for arrhythmia discrimination. The 2D-ViT achieved lower accuracy (92.6%) due to temporal information loss, though it maintained competitive calibration (AUC 98.6%) and generalization. A bidirectional fusion model combining both encoders through cross-attention exhibited complementary behavior but did not surpass the 1D baseline. These findings indicate that while spectro-temporal information can enhance interpretability and stability, temporal-domain fidelity remains the dominant factor for reliable ECG classification.
APA
Aman, M., Uiso, G., Mukamakuza, C. & Bhagavatula, V.. (2026). A Systematic Comparison of Data Representations for Transformer-Based ECG Arrhythmia Classification. Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, in Proceedings of Machine Learning Research 317:37-45 Available from https://proceedings.mlr.press/v317/aman26a.html.

Related Material