[edit]
Self-Supervised Learning of ECG and PPG Signals for Multi-Modal Health Monitoring
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:350-358, 2025.
Abstract
Self-supervised multimodal time-series analysis faces critical challenges including cross-domain temporal shifts, sensor noise, and inter-subject variability, which degrade disease classification performance. Existing methods often depend on labeled data or explicit target domain alignment, limiting their clinical practicality. We propose TSTA-Net, a novel framework that integrates: (1) a residual spatiotemporal transformer (STN) to dynamically correct sensor shifts and motion artifacts, (2) a dual-branch Transformer for capturing long-range dependencies, and (3) hierarchical contrastive learning for spatiotemporal alignment of ECG and PPG signals. This integrated approach addresses both temporal dynamics and spatial inconsistencies through joint optimization. On atrial fibrillation detection, TSTA-Net achieves a 9.3% higher F1-score than state-of-the-art self-supervised methods, with ablation studies verifying that the spatiotemporal alignment mechanism contributes 68% of the performance gain. The lightweight framework ($<$1M parameters) reduces annotation dependency while enabling real-time arrhythmia screening on wearable devices, advancing self-supervised learning for practical healthcare applications.