High-Dimensional Multi-Task Averaging and Application to Kernel Mean Embedding

Hannah Marienwald, Jean-Baptiste Fermanian, Gilles Blanchard
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1963-1971, 2021.

Abstract

We propose an improved estimator for the multi-task averaging problem, whose goal is the joint estimation of the means of multiple distributions using separate, independent data sets. The naive approach is to take the empirical mean of each data set individually, whereas the proposed method exploits similarities between tasks, without any related information being known in advance. First, for each data set, similar or neighboring means are determined from the data by multiple testing. Then each naive estimator is shrunk towards the local average of its neighbors. We prove theoretically that this approach provides a reduction in mean squared error. This improvement can be significant when the dimension of the input space is large; demonstrating a “blessing of dimensionality” phenomenon. An application of this approach is the estimation of multiple kernel mean embeddings, which plays an important role in many modern applications. The theoretical results are verified on artificial and real world data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-marienwald21a, title = { High-Dimensional Multi-Task Averaging and Application to Kernel Mean Embedding }, author = {Marienwald, Hannah and Fermanian, Jean-Baptiste and Blanchard, Gilles}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1963--1971}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/marienwald21a/marienwald21a.pdf}, url = {https://proceedings.mlr.press/v130/marienwald21a.html}, abstract = { We propose an improved estimator for the multi-task averaging problem, whose goal is the joint estimation of the means of multiple distributions using separate, independent data sets. The naive approach is to take the empirical mean of each data set individually, whereas the proposed method exploits similarities between tasks, without any related information being known in advance. First, for each data set, similar or neighboring means are determined from the data by multiple testing. Then each naive estimator is shrunk towards the local average of its neighbors. We prove theoretically that this approach provides a reduction in mean squared error. This improvement can be significant when the dimension of the input space is large; demonstrating a “blessing of dimensionality” phenomenon. An application of this approach is the estimation of multiple kernel mean embeddings, which plays an important role in many modern applications. The theoretical results are verified on artificial and real world data. } }
Endnote
%0 Conference Paper %T High-Dimensional Multi-Task Averaging and Application to Kernel Mean Embedding %A Hannah Marienwald %A Jean-Baptiste Fermanian %A Gilles Blanchard %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-marienwald21a %I PMLR %P 1963--1971 %U https://proceedings.mlr.press/v130/marienwald21a.html %V 130 %X We propose an improved estimator for the multi-task averaging problem, whose goal is the joint estimation of the means of multiple distributions using separate, independent data sets. The naive approach is to take the empirical mean of each data set individually, whereas the proposed method exploits similarities between tasks, without any related information being known in advance. First, for each data set, similar or neighboring means are determined from the data by multiple testing. Then each naive estimator is shrunk towards the local average of its neighbors. We prove theoretically that this approach provides a reduction in mean squared error. This improvement can be significant when the dimension of the input space is large; demonstrating a “blessing of dimensionality” phenomenon. An application of this approach is the estimation of multiple kernel mean embeddings, which plays an important role in many modern applications. The theoretical results are verified on artificial and real world data.
APA
Marienwald, H., Fermanian, J. & Blanchard, G.. (2021). High-Dimensional Multi-Task Averaging and Application to Kernel Mean Embedding . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1963-1971 Available from https://proceedings.mlr.press/v130/marienwald21a.html.

Related Material