Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets

Ola Spjuth, Robin Carrión Brännström, Lars Carlsson, Niharika Gauraha
Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications, PMLR 105:53-65, 2019.

Abstract

Conformal Prediction is a framework that produces prediction intervals based on the output from a machine learning algorithm. In this paper we explore the case when training data is made up of multiple parts available in different sources that cannot be pooled. We here consider the regression case and propose a method where a conformal predictor is trained on each data source independently, and where the prediction intervals are then combined into a single interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and we evaluate it on a regression dataset from the UCI machine learning repository using support vector regression as the underlying machine learning algorithm, with varying number of data sources and sizes. The results show that the proposed method produces conservatively valid prediction intervals, and while we cannot retain the same efficiency as when all data is used, efficiency is improved through the proposed approach as compared to predicting using a single arbitrarily chosen source.

Cite this Paper


BibTeX
@InProceedings{pmlr-v105-spjuth19a, title = {Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets}, author = {Spjuth, Ola and Br\"annstr\"om, Robin Carri\'on and Carlsson, Lars and Gauraha, Niharika}, booktitle = {Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications}, pages = {53--65}, year = {2019}, editor = {Gammerman, Alex and Vovk, Vladimir and Luo, Zhiyuan and Smirnov, Evgueni}, volume = {105}, series = {Proceedings of Machine Learning Research}, month = {09--11 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v105/spjuth19a/spjuth19a.pdf}, url = {https://proceedings.mlr.press/v105/spjuth19a.html}, abstract = {Conformal Prediction is a framework that produces prediction intervals based on the output from a machine learning algorithm. In this paper we explore the case when training data is made up of multiple parts available in different sources that cannot be pooled. We here consider the regression case and propose a method where a conformal predictor is trained on each data source independently, and where the prediction intervals are then combined into a single interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and we evaluate it on a regression dataset from the UCI machine learning repository using support vector regression as the underlying machine learning algorithm, with varying number of data sources and sizes. The results show that the proposed method produces conservatively valid prediction intervals, and while we cannot retain the same efficiency as when all data is used, efficiency is improved through the proposed approach as compared to predicting using a single arbitrarily chosen source.} }
Endnote
%0 Conference Paper %T Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets %A Ola Spjuth %A Robin Carrión Brännström %A Lars Carlsson %A Niharika Gauraha %B Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications %C Proceedings of Machine Learning Research %D 2019 %E Alex Gammerman %E Vladimir Vovk %E Zhiyuan Luo %E Evgueni Smirnov %F pmlr-v105-spjuth19a %I PMLR %P 53--65 %U https://proceedings.mlr.press/v105/spjuth19a.html %V 105 %X Conformal Prediction is a framework that produces prediction intervals based on the output from a machine learning algorithm. In this paper we explore the case when training data is made up of multiple parts available in different sources that cannot be pooled. We here consider the regression case and propose a method where a conformal predictor is trained on each data source independently, and where the prediction intervals are then combined into a single interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and we evaluate it on a regression dataset from the UCI machine learning repository using support vector regression as the underlying machine learning algorithm, with varying number of data sources and sizes. The results show that the proposed method produces conservatively valid prediction intervals, and while we cannot retain the same efficiency as when all data is used, efficiency is improved through the proposed approach as compared to predicting using a single arbitrarily chosen source.
APA
Spjuth, O., Brännström, R.C., Carlsson, L. & Gauraha, N.. (2019). Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets. Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications, in Proceedings of Machine Learning Research 105:53-65 Available from https://proceedings.mlr.press/v105/spjuth19a.html.

Related Material