On Statistical Learning Theory for Distributional Inputs

Christian Fiedler, Pierre-François Massiani, Friedrich Solowjow, Sebastian Trimpe
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:13608-13632, 2024.

Abstract

Kernel-based statistical learning on distributional inputs appears in many relevant applications, from medical diagnostics to causal inference, and poses intriguing theoretical questions. While this learning scenario received considerable attention from the machine learning community recently, many gaps in the theory remain. In particular, most works consider only the distributional regression setting, and focus on the regularized least-squares algorithm for this problem. In this work, we start to fill these gaps. We prove two oracle inequalities for kernel machines in general distributional learning scenarios, as well as a generalization result based on algorithmic stability. Our main results are formulated in great generality, utilizing general Hilbertian embeddings, which makes them applicable to a wide array of approaches to distributional learning. Additionally, we specialize our results to the cases of kernel mean embeddings and of the recently introduced Hilbertian embeddings based on sliced Wasserstein distances, providing concrete instances of the general setup. Our results considerably enlarge the scope of theoretically grounded distributional learning, and provide many interesting avenues for future work.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-fiedler24a, title = {On Statistical Learning Theory for Distributional Inputs}, author = {Fiedler, Christian and Massiani, Pierre-Fran\c{c}ois and Solowjow, Friedrich and Trimpe, Sebastian}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {13608--13632}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/fiedler24a/fiedler24a.pdf}, url = {https://proceedings.mlr.press/v235/fiedler24a.html}, abstract = {Kernel-based statistical learning on distributional inputs appears in many relevant applications, from medical diagnostics to causal inference, and poses intriguing theoretical questions. While this learning scenario received considerable attention from the machine learning community recently, many gaps in the theory remain. In particular, most works consider only the distributional regression setting, and focus on the regularized least-squares algorithm for this problem. In this work, we start to fill these gaps. We prove two oracle inequalities for kernel machines in general distributional learning scenarios, as well as a generalization result based on algorithmic stability. Our main results are formulated in great generality, utilizing general Hilbertian embeddings, which makes them applicable to a wide array of approaches to distributional learning. Additionally, we specialize our results to the cases of kernel mean embeddings and of the recently introduced Hilbertian embeddings based on sliced Wasserstein distances, providing concrete instances of the general setup. Our results considerably enlarge the scope of theoretically grounded distributional learning, and provide many interesting avenues for future work.} }
Endnote
%0 Conference Paper %T On Statistical Learning Theory for Distributional Inputs %A Christian Fiedler %A Pierre-François Massiani %A Friedrich Solowjow %A Sebastian Trimpe %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-fiedler24a %I PMLR %P 13608--13632 %U https://proceedings.mlr.press/v235/fiedler24a.html %V 235 %X Kernel-based statistical learning on distributional inputs appears in many relevant applications, from medical diagnostics to causal inference, and poses intriguing theoretical questions. While this learning scenario received considerable attention from the machine learning community recently, many gaps in the theory remain. In particular, most works consider only the distributional regression setting, and focus on the regularized least-squares algorithm for this problem. In this work, we start to fill these gaps. We prove two oracle inequalities for kernel machines in general distributional learning scenarios, as well as a generalization result based on algorithmic stability. Our main results are formulated in great generality, utilizing general Hilbertian embeddings, which makes them applicable to a wide array of approaches to distributional learning. Additionally, we specialize our results to the cases of kernel mean embeddings and of the recently introduced Hilbertian embeddings based on sliced Wasserstein distances, providing concrete instances of the general setup. Our results considerably enlarge the scope of theoretically grounded distributional learning, and provide many interesting avenues for future work.
APA
Fiedler, C., Massiani, P., Solowjow, F. & Trimpe, S.. (2024). On Statistical Learning Theory for Distributional Inputs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:13608-13632 Available from https://proceedings.mlr.press/v235/fiedler24a.html.

Related Material