Distribution Regression for Sequential Data

Maud Lemercier, Cristopher Salvi, Theodoros Damoulas, Edwin Bonilla, Terry Lyons
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3754-3762, 2021.

Abstract

Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-lemercier21a, title = { Distribution Regression for Sequential Data }, author = {Lemercier, Maud and Salvi, Cristopher and Damoulas, Theodoros and Bonilla, Edwin and Lyons, Terry}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {3754--3762}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/lemercier21a/lemercier21a.pdf}, url = {https://proceedings.mlr.press/v130/lemercier21a.html}, abstract = { Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science. } }
Endnote
%0 Conference Paper %T Distribution Regression for Sequential Data %A Maud Lemercier %A Cristopher Salvi %A Theodoros Damoulas %A Edwin Bonilla %A Terry Lyons %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-lemercier21a %I PMLR %P 3754--3762 %U https://proceedings.mlr.press/v130/lemercier21a.html %V 130 %X Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.
APA
Lemercier, M., Salvi, C., Damoulas, T., Bonilla, E. & Lyons, T.. (2021). Distribution Regression for Sequential Data . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:3754-3762 Available from https://proceedings.mlr.press/v130/lemercier21a.html.

Related Material