Semi-supervised Meta-learning for Multi-source Heterogeneity in Time-series Data

Lida Zhang; Bobak J. Mortazavi

Semi-supervised Meta-learning for Multi-source Heterogeneity in Time-series Data

Lida Zhang, Bobak J. Mortazavi

Proceedings of the 8th Machine Learning for Healthcare Conference, PMLR 219:923-941, 2023.

Abstract

Real-world time-series data is riddled with heterogeneity that is often present across a number of dataset dimensions: features, labels, and time-varying factors. The heterogeneity in time-series data may be raised by introducing new features, missing data, and domain shifts in the feature dimension, and the difficulty of collecting promising ground truth results in label uncertainty. In addition, the variation on the time manner further aggravates the complexity of data heterogeneity, since the features and labels may change on the same sequence of data over time. Many machine learning techniques have been proposed to address the data heterogeneity, including transfer learning, meta-learning, semi-supervised learning, recurrent networks, etc. However, each of these techniques is limited to one type of heterogeneity. In this study, we seek to create adaptable models for the multi-source heterogeneity in time-series data. We propose a semi-supervised-based meta-learning (SSML) with an adversarial training mechanism simultaneously addressing the heterogeneous features and labeling uncertainty, a time domain variation (TDV) framework to apply SSML and transfer learning for the third level of data heterogeneity. We test our models on two medical datasets, PhysioNet Challenge 2012 and MIMIC-III ICU dataset, and improve over all benchmark models. Our code is available at https://github.com/lidazhang/ ssml-time-series-heterogeneity.git.

Cite this Paper

BibTeX


@InProceedings{pmlr-v219-zhang23a,
  title = 	 {Semi-supervised Meta-learning for Multi-source Heterogeneity in Time-series Data},
  author =       {Zhang, Lida and Mortazavi, Bobak J.},
  booktitle = 	 {Proceedings of the 8th Machine Learning for Healthcare Conference},
  pages = 	 {923--941},
  year = 	 {2023},
  editor = 	 {Deshpande, Kaivalya and Fiterau, Madalina and Joshi, Shalmali and Lipton, Zachary and Ranganath, Rajesh and Urteaga, Iñigo and Yeung, Serene},
  volume = 	 {219},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--12 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v219/zhang23a/zhang23a.pdf},
  url = 	 {https://proceedings.mlr.press/v219/zhang23a.html},
  abstract = 	 {Real-world time-series data is riddled with heterogeneity that is often present across a number of dataset dimensions: features, labels, and time-varying factors. The heterogeneity in time-series data may be raised by introducing new features, missing data, and domain shifts in the feature dimension, and the difficulty of collecting promising ground truth results in label uncertainty. In addition, the variation on the time manner further aggravates the complexity of data heterogeneity, since the features and labels may change on the same sequence of data over time. Many machine learning techniques have been proposed to address the data heterogeneity, including transfer learning, meta-learning, semi-supervised learning, recurrent networks, etc. However, each of these techniques is limited to one type of heterogeneity. In this study, we seek to create adaptable models for the multi-source heterogeneity in time-series data. We propose a semi-supervised-based meta-learning (SSML) with an adversarial training mechanism simultaneously addressing the heterogeneous features and labeling uncertainty, a time domain variation (TDV) framework to apply SSML and transfer learning for the third level of data heterogeneity. We test our models on two medical datasets, PhysioNet Challenge 2012 and MIMIC-III ICU dataset, and improve over all benchmark models. Our code is available at https://github.com/lidazhang/ ssml-time-series-heterogeneity.git.}
}

Endnote

%0 Conference Paper
%T Semi-supervised Meta-learning for Multi-source Heterogeneity in Time-series Data
%A Lida Zhang
%A Bobak J. Mortazavi
%B Proceedings of the 8th Machine Learning for Healthcare Conference
%C Proceedings of Machine Learning Research
%D 2023
%E Kaivalya Deshpande
%E Madalina Fiterau
%E Shalmali Joshi
%E Zachary Lipton
%E Rajesh Ranganath
%E Iñigo Urteaga
%E Serene Yeung	
%F pmlr-v219-zhang23a
%I PMLR
%P 923--941
%U https://proceedings.mlr.press/v219/zhang23a.html
%V 219
%X Real-world time-series data is riddled with heterogeneity that is often present across a number of dataset dimensions: features, labels, and time-varying factors. The heterogeneity in time-series data may be raised by introducing new features, missing data, and domain shifts in the feature dimension, and the difficulty of collecting promising ground truth results in label uncertainty. In addition, the variation on the time manner further aggravates the complexity of data heterogeneity, since the features and labels may change on the same sequence of data over time. Many machine learning techniques have been proposed to address the data heterogeneity, including transfer learning, meta-learning, semi-supervised learning, recurrent networks, etc. However, each of these techniques is limited to one type of heterogeneity. In this study, we seek to create adaptable models for the multi-source heterogeneity in time-series data. We propose a semi-supervised-based meta-learning (SSML) with an adversarial training mechanism simultaneously addressing the heterogeneous features and labeling uncertainty, a time domain variation (TDV) framework to apply SSML and transfer learning for the third level of data heterogeneity. We test our models on two medical datasets, PhysioNet Challenge 2012 and MIMIC-III ICU dataset, and improve over all benchmark models. Our code is available at https://github.com/lidazhang/ ssml-time-series-heterogeneity.git.

APA


Zhang, L. & Mortazavi, B.J.. (2023). Semi-supervised Meta-learning for Multi-source Heterogeneity in Time-series Data. Proceedings of the 8th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 219:923-941 Available from https://proceedings.mlr.press/v219/zhang23a.html.

Related Material

Download PDF