Clustered Conformal Prediction for the Housing Market

Anders Hjort, Jonathan P. Williams, Johan Pensar
Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 230:366-386, 2024.

Abstract

Conformal prediction (CP) is a framework for constructing confidence sets around predictions from machine learning models with finite sample guarantees with few assumptions on both the prediction model and the data. In practice, the construction of CP sets typically relies on quantile estimates from an empirical distribution of non-conformity scores. When the data set consists of predefined, non-overlapping classes such as geographical regions, a common technique for improving the confidence sets is to calculate a different quantile for each class. However, the classwise quantile estimate suffers from high variance when the number of observations in each class is low. To circumvent this, one can share calibration data between classes with similar empirical distributions of non-conformity scores to reduce the variance of the quantile estimate. We study this approach for the application of house price prediction in the Norwegian housing market, where $286$ different municipalities serve as the initial classes of the data. We find that clustering together municipalities based on non-conformity score distributions, agnostic of the spatial distance between them, leads to CP sets that achieve, on average, a lower coverage gap in each municipality, in particular for the municipalities with few observations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v230-hjort24a, title = {Clustered Conformal Prediction for the Housing Market}, author = {Hjort, Anders and Williams, Jonathan P. and Pensar, Johan}, booktitle = {Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications}, pages = {366--386}, year = {2024}, editor = {Vantini, Simone and Fontana, Matteo and Solari, Aldo and Boström, Henrik and Carlsson, Lars}, volume = {230}, series = {Proceedings of Machine Learning Research}, month = {09--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v230/main/assets/hjort24a/hjort24a.pdf}, url = {https://proceedings.mlr.press/v230/hjort24a.html}, abstract = {Conformal prediction (CP) is a framework for constructing confidence sets around predictions from machine learning models with finite sample guarantees with few assumptions on both the prediction model and the data. In practice, the construction of CP sets typically relies on quantile estimates from an empirical distribution of non-conformity scores. When the data set consists of predefined, non-overlapping classes such as geographical regions, a common technique for improving the confidence sets is to calculate a different quantile for each class. However, the classwise quantile estimate suffers from high variance when the number of observations in each class is low. To circumvent this, one can share calibration data between classes with similar empirical distributions of non-conformity scores to reduce the variance of the quantile estimate. We study this approach for the application of house price prediction in the Norwegian housing market, where $286$ different municipalities serve as the initial classes of the data. We find that clustering together municipalities based on non-conformity score distributions, agnostic of the spatial distance between them, leads to CP sets that achieve, on average, a lower coverage gap in each municipality, in particular for the municipalities with few observations.} }
Endnote
%0 Conference Paper %T Clustered Conformal Prediction for the Housing Market %A Anders Hjort %A Jonathan P. Williams %A Johan Pensar %B Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications %C Proceedings of Machine Learning Research %D 2024 %E Simone Vantini %E Matteo Fontana %E Aldo Solari %E Henrik Boström %E Lars Carlsson %F pmlr-v230-hjort24a %I PMLR %P 366--386 %U https://proceedings.mlr.press/v230/hjort24a.html %V 230 %X Conformal prediction (CP) is a framework for constructing confidence sets around predictions from machine learning models with finite sample guarantees with few assumptions on both the prediction model and the data. In practice, the construction of CP sets typically relies on quantile estimates from an empirical distribution of non-conformity scores. When the data set consists of predefined, non-overlapping classes such as geographical regions, a common technique for improving the confidence sets is to calculate a different quantile for each class. However, the classwise quantile estimate suffers from high variance when the number of observations in each class is low. To circumvent this, one can share calibration data between classes with similar empirical distributions of non-conformity scores to reduce the variance of the quantile estimate. We study this approach for the application of house price prediction in the Norwegian housing market, where $286$ different municipalities serve as the initial classes of the data. We find that clustering together municipalities based on non-conformity score distributions, agnostic of the spatial distance between them, leads to CP sets that achieve, on average, a lower coverage gap in each municipality, in particular for the municipalities with few observations.
APA
Hjort, A., Williams, J.P. & Pensar, J.. (2024). Clustered Conformal Prediction for the Housing Market. Proceedings of the Thirteenth Symposium on Conformal and Probabilistic Prediction with Applications, in Proceedings of Machine Learning Research 230:366-386 Available from https://proceedings.mlr.press/v230/hjort24a.html.

Related Material