Proceedings of Machine Learning ResearchProceedings of the Ninth Symposium on Conformal and Probabilistic Prediction and Applications on 09-11 September 2020
Published as Volume 128 by the Proceedings of Machine Learning Research on 16 August 2020.
Volume Edited by:
Alexander Gammerman
Vladimir Vovk
Zhiyuan Luo
Evgueni Smirnov
Giovanni Cherubin
Series Editors:
Neil D. Lawrence
Mark Reid
https://proceedings.mlr.press/v128/
Wed, 08 Feb 2023 10:36:32 +0000Wed, 08 Feb 2023 10:36:32 +0000Jekyll v3.9.3Application of conformal prediction interval estimations to market makers’ net positions In this study we focus on the application of Conformal Prediction (CP) interval estimations to provide financial Market Makers (MMs) with some “meaningful” forecasts relating to their future short-term position in a given financial market. The idea is that using these market position forecasts, MMs can deploy proactive risk management strategies with a given degree of confidence. We make use of a novel financial time series dataset that comprises the net positions of a given MM over a three year period for trades pertaining to the top-traded Foreign Exchange (FX) symbols. This dataset – \nolinebreak{NetPositionTimeSeries} – is noisy and complex. The net positions within it are generated from the trades of tens of thousands of clients trading in different directions (buy or sell) and over many different time horizons. We approached the problem of predicting future net position not as one that required an accurate point estimate as this is impossible. Rather we sought to gain a meaningful range of possible position bounds which would nonetheless be invaluable. In this study we tested a range of predictive Machine Learning (ML) techniques. We compared the CP framework to benchmark methods like moving average (MA) and quantile regression (QR). We demonstrate how application of the CP framework gives well calibrated region bounds on the MM net position forecasts.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/wisniewski20a.html
https://proceedings.mlr.press/v128/wisniewski20a.htmlEvaluating different approaches to calibrating conformal predictive systemsConformal predictive systems (CPSs) provide probability distributions for real-valued labels of test examples, rather than point predictions (as output by regular regression models) or confidence intervals (as output by conformal regressors). The performance of a CPS is dependent on both the underlying model and the way in which the quality of its predictions is estimated; a stronger underlying model and a better quality estimation can significantly improve the performance. Recent studies have shown that conformal regressors that use random forests as the underlying model may benefit from using out-of-bag predictions for the calibration, rather than setting aside a separate calibration set, allowing for more data to be used for training and thereby improving the performance of the underlying model. These studies have furthermore shown that the quality of the individual predictions can be effectively estimated using the variance of the predictions or by k-nearest-neighbor models trained on the prediction errors. It is here investigated whether these methods are also effective in the context of split conformal predictive systems. Results from a large empirical study are presented, using 33 publicly available datasets. The results show that by using either variance or the k-nearest-neighbor method for estimating prediction quality, a significant increase in performance, as measured by the continuous ranked probability score, can be obtained compared to omitting the quality estimation. The results furthermore show that the use of out-of-bag examples for calibration is competitive with the most effective way of splitting training data into a proper training set and a calibration set, without requiring tuning of the calibration set size.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/werner20a.html
https://proceedings.mlr.press/v128/werner20a.htmlConformal calibratorsMost existing examples of full conformal predictive systems, split conformal predictive systems, and cross-conformal predictive systems impose severe restrictions on the adaptation of predictive distributions to the test object at hand. In this paper we develop split conformal predictive systems that are fully adaptive. Our method consists in calibrating existing predictive systems; the input predictive system is not supposed to satisfy any properties of validity, whereas the output predictive system is guaranteed to be calibrated in probability.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/vovk20a.html
https://proceedings.mlr.press/v128/vovk20a.htmlComplete statistical theory of learning: learning using statistical invariantsStatistical theory of learning considers methods of constructing approximations that converge to the desired function with increasing number of observations. This theory studies mechanisms that provide convergence in the space of functions in $L_2$ norm, i.e., it studies the so-called strong mode of convergence. However, in Hilbert space, along with the convergence in the space of functions, there also exists the so-called weak mode of convergence, i.e., convergence in the space of functionals. Under some conditions, this weak mode of convergence also implies the convergence of approximations to the desired function in $L_2$ norm, although such convergence is based on other mechanisms. The paper discusses new learning methods which use both modes of convergence (weak and strong) simultaneously. Such methods allow one to execute the following: (1) select an admissible subset of functions (i.e., the set of appropriate approximation functions), and (2) find the desired approximation in this admissible subset. Since only two modes of convergence exist in Hilbert space, we call the theory that uses both modes the complete statistical theory of learning. Along with general reasoning, we describe new learning algorithms referred to as Learning Using Statistical Invariants (LUSI). LUSI algorithms were developed for sets of functions belonging to Reproducing Kernel Hilbert Space (RKHS); they include the modified SVM method (LUSI-SVM method). Also, the paper presents a LUSI modification of Neural Networks (LUSI-NN). LUSI methods require fewer training examples that standard approaches for achieving the same performance. In conclusion, the paper discusses the general (philosophical) framework of a new learn- ing paradigm that includes the concept of intelligence.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/vapnik20a.html
https://proceedings.mlr.press/v128/vapnik20a.htmlFast probabilistic prediction for kernel SVM via enclosing ballsSupport Vector Machine (SVM) is a powerful paradigm that has proven to be extremely useful for the task of classifying high-dimensional objects. It does not only perform well in learning linear classifiers, but also shows outstanding performance in capturing non-linearity through the use of kernels. In principle, SVM allows us to train “scoring” classifiers i.e. classifiers that output a prediction score. However, it can also be adapted to produce probability-type outputs through the use of the Venn-Abers framework. This allows us to obtain valuable information on the labels distribution for each test object. This procedure, however, is restricted to very small data given its inherent computational complexity. We circumvent this limitation by borrowing results from the field of computational geometry. Specifically, we make use of the concept of a coreset: a small summary of data that is constructed by discretising the input space into enclosing balls, so that each ball will be represented by only one object. Our results indicate that training Venn-Abers predictors using enclosing balls provides an average acceleration of 8 times compared to the regular Venn-Abers approach while largely retaining probability calibration. These promising results imply that we can still enjoy well-calibrated probabilistic outputs for kernel SVM even in the realm of large-scale datasets.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/riquelme-granada20a.html
https://proceedings.mlr.press/v128/riquelme-granada20a.htmlConformal anomaly detection for visual reconstruction using gestalt principlesIn this paper, we combine a modern machine learning technique called conformal predictors (CP) with elements of gestalt detection and apply them to the problem of visual perception in digital images. Our main task is to quantify several gestalt principles of visual reconstruction. We interpret an image/shape as being perceivable (meaningful) if it sufficiently deviates from randomness - in other words, the image could hardly happen by chance. These deviations from randomness are measured by using conformal prediction technique that can guarantee the validity under certain assumptions. The technique describes the detection of perceivable images that allows to bound the number of false alarms, i.e. the proportion of non-perceivable images wrongly detected as perceivable. Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/nouretdinov20a.html
https://proceedings.mlr.press/v128/nouretdinov20a.htmlConformal multi-target regression using neural networksMulti-task learning is a domain that is still not fully studied in the conformal prediction framework, and this is particularly true for multi-target regression. Our work uses inductive conformal prediction along with deep neural networks to handle multi-target regression by exploring multiple extensions of existing single-target non-conformity measures and proposing new ones. This paper presents our approaches to work with conformal prediction in the multiple regression setting, as well as the results of our conducted experiments.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/messoudi20a.html
https://proceedings.mlr.press/v128/messoudi20a.htmlBERT-based conformal predictor for sentiment analysisWe deal with the Natural Language Processing (NLP) task of Sentiment Analysis (SA) on text, by applying Inductive Conformal Prediction (ICP) on a transformers based model. SA, which is the interpretation and classification of emotions, also referred to as emotional artificial intelligence, can be set up as a Text Classification (TC) problem. Transformers are deep neural network models based on the attention mechanism and make use of transfer learning by being pretrained on a large unlabeled corpus. Transformer based models have been the state of the art for dealing with various NLP tasks ever since they were proposed at the end of 2018. Our classifier consists of the BERT model for turning words into contextualized word embeddings with parameters fine-tuned on the used corpus and a fully connected output layer for performing the classification task. We examine the performance of the underlying BERT model and the proposed ICP on the Large Movie Review dataset consisting of 50000 movie reviews. The results show that the good performance of the underlying classifier is carried on to the ICP extension without any substantial accuracy loss while the provided prediction sets are tight enough to be useful in practise.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/maltoudoglou20a.html
https://proceedings.mlr.press/v128/maltoudoglou20a.htmlMixing past predictionsIn the framework of the theory of prediction with expert advice, we present an algorithm for online aggregation of the functional predictions. The approach implies that at each time step some algorithm issues a forecast in the form of a function and then the master algorithm combines these current and past functional forecasts into one aggregated functional forecast. We apply the proposed algorithm for the problem of long-term predictions of time series. By combining the past and current long-term functional forecasts, we obtain a smoothing mechanism that protects our algorithm from temporary changes in the trend of time series, noise and outliers. To evaluate the performance of presented aggregating algorithm as a long-term forecaster we use a new “integral” loss function and the delayed feedback approach. We apply this algorithm for the regression problems, we present some method for smoothing regression forecasts.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/korotin20a.html
https://proceedings.mlr.press/v128/korotin20a.htmlA conformalized density-based clustering analysis of malicious traffic for botnet detectionIn this work, we present a clustering technique within the conformal prediction framework and describe its application to bot-generated network traffic in order to build botnet behavioral models, with a view to improving the detection of compromised hosts. The technique has a natural connection to density-based clustering. Once a required significance level has been set, this technique can discover the clusters and the noise in the data. To obtain a clustering of the underlying distribution, we use conformal prediction in combination with a density estimator which is used for point prediction, to identify a few so-called focal points that are indeed the centers of possibly overlapping spheres or ellipsoids, that represent the clusters. There are several advantages to the developed technique: the number of clusters is determined automatically. Furthermore, the technique is able to find nonlinearly separable clusters. Moreover, a new conformity measure related to BotFinder, an algorithm for finding bots in network traffic, is developed that can be used as a method for point prediction. We performed an experimental evaluation of the proposed approach in terms of efficiency and accuracy. The results suggest that the approach obtains relatively high accuracies and is more effective when compared with previous conformal clustering techniques.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/kiani20a.html
https://proceedings.mlr.press/v128/kiani20a.htmlClassication of aerosol particles using inductive conformal predictionAerosol particles are small airborne particles suspended in air affecting the climate and human health. Different types of particles come from different sources and impact the environment in different ways, which is why a reliable particle classification is of interest. In this study, inductive conformal prediction is applied to a dataset of laboratory-generated aerosol particles, consisting of ten particle subclasses that can be grouped into four parent classes for classification. The performance of the inductive conformal predictor (ICP) is evaluated on particle subclasses that were not included in training or calibration. The ICP appears to give accurate predictions in some cases, namely if the unknown particle is similar to the known ones in the parent class. The precision of the underlying model is not high enough to reject all unknown particles for any subclass at the chosen significance levels, but the ICP manages to reject them at a higher rate if they are sufficiently different from the training and calibration samples. Overall, the performance is not straightforward to evaluate and it seems to depend on the heterogeneity and size of the classes of particles. Further investigations using a simpler data and model set-up would be beneficial, and data and sampling standardisation should be considered more carefully if the model is to be applied to field measurements.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/karlsson20a.html
https://proceedings.mlr.press/v128/karlsson20a.htmlPrefaceSun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/gammerman20a.html
https://proceedings.mlr.press/v128/gammerman20a.htmlA histogram based betting function for conformal martingalesThis paper investigates the use of Conformal Martingales (CM) for providing a numerical indication of how likely it is that the exchangeability assumption holds on a set of data. Reliable and fast testing of exchangeability is an important challenge because many machine learning algorithms rely on this assumption. Therefore a technique with only a few parameters to tune, that is able to reject the exchangeability assumption with respect to a significance level should be very beneficial for enhancing the reliability of such machine learning models. Our approach consists of a CM whose betting function is estimated on the previously seen p-values, we compare its computational efficiency and its performance with a kernel betting function and the Kolmogorov-Smirnoff test. We test our approach on two benchmark data-sets, USPS and Statlog Satellite data.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/eliades20a.html
https://proceedings.mlr.press/v128/eliades20a.htmlBatch mode active learning for mitotic phenotypes using conformal predictionMachine learning models are now ubiquitous in all areas of data analysis. As the amount of data generated continues to increase exponentially, the task of annotating sufficient objects with known labels by an expert remains expensive. To mitigate this, active learning approaches attempt to identify those objects whose labels will be most informative. Here, we introduce a batch-based active learning framework in a pooled setting based around conformal predictors. We select objects to add to the labelled observations based on perceived novelty, while mitigating the risks of selecting highly correlated or outlying observations. We compare our approach to classical methods using an example UCI dataset, and demonstrate its application to a pharmaceutically relevant cellular imaging problem for classifying mitotic phenotypes. Our approach facilitates efficient discovery of rare and novel classes within large screening datasets.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/corrigan20a.html
https://proceedings.mlr.press/v128/corrigan20a.htmlTraining conformal predictorsEfficiency criteria for conformal prediction, such as observed fuzziness (i.e., the sum of p-values associated with false labels), are commonly used to evaluate the performance of given conformal predictors. Here, we investigate whether it is possible to exploit efficiency criteria to learn classifiers, both conformal predictors and point classifiers, by using such criteria as training objective functions. The proposed idea is implemented for the problem of binary classification of hand-written digits. By choosing a 1-dimensional model class (with one real-valued free parameter), we can solve the optimization problems through an (approximate) exhaustive search over (a discrete version of) the parameter space. Our empirical results suggest that conformal predictors trained by minimizing their observed fuzziness perform better than conformal predictors trained in the traditional way by minimizing the prediction error of the corresponding point classifier. They also have reasonable performance in terms of their prediction error on the test set.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/colombo20a.html
https://proceedings.mlr.press/v128/colombo20a.htmlMondrian conformal regressorsStandard (non-normalized) conformal regressors produce intervals that are of identical size and hence non-informative in the sense that they provide no information about the uncertainty at the instance level. A common approach to handle this limitation is to normalize the produced interval using a difficulty estimate, which results in larger intervals for instances judged to be more difficult and smaller intervals for instances judged to be easier. A problem with this approach is identified; when the difficulty estimation function provides little or no information about the true error at the instance level, one would expect the predicted intervals to be more similar in size compared to when using a more accurate difficulty estimation function. However, experiments on both synthetic and real-world datasets show the opposite. Moreover, the intervals produced by normalized conformal regressors may be several times larger than the largest previously observed prediction error, which clearly is counter-intuitive. To alleviate these problems, we propose Mondrian conformal regressors, which partition the calibration instances into a number of categories, before generating one prediction interval for each category, using a standard conformal regressor. Here, binning of the difficulty estimates is employed for the categorization. In contrast to normalized conformal regressors, Mondrian conformal regressors can never produce intervals that are larger than twice the largest observed error. The experiments verify that the resulting regressors are valid and as efficient as when using normalization, while being significantly more efficient than the standard variant. Most importantly, the experiments show that Mondrian conformal regressors, in contrast to normalized conformal regressors, have the desired property that the variance of the size of the predicted intervals correlates positively with the accuracy of the function that is used to estimate difficulty.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/bostrom20a.html
https://proceedings.mlr.press/v128/bostrom20a.htmlConstructing normalized nonconformity measures based on maximizing predictive efficiencyThe problem of regression in the inductive conformal prediction framework is addressed to provide prediction intervals that are optimized by predictive efficiency. A differentiable function is used to approximate the exact optimization problem of minimizing predictive inefficiency on a training data set using a conformal predictor based on a parametric normalized nonconformity measure. Gradient descent is then used to find a solution. Since the optimization approximates the conformal predictor, this method is called surrogate conformal predictor optimization. Experiments are reported that show that it results in conformal predictors that provide improved predictive efficiency for regression problems on several data sets, whilst remaining reliable. It is also shown that the optimal parameter values typically differ for different confidence levels. Using house price data, alternative measures of inefficiency are explored to address different application requirements.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/bellotti20a.html
https://proceedings.mlr.press/v128/bellotti20a.htmlPractical investment with the long-short gameIn this paper we apply the aggregating algorithm, an on-line prediction with expert advice algorithm, to real-world foreign exchange trading data with the aim of finding investment strategies with optimal returns. We consider the Long-Short game first introduced in Vovk and Watkins (1998) and it’s implementation, including the derivation of expert predictions from model trading data. Furthermore, we propose modifications to improve the practical performance of the game with respect to well-known portfolio performance indicators.Sun, 16 Aug 2020 00:00:00 +0000
https://proceedings.mlr.press/v128/al-baghdadi20a.html
https://proceedings.mlr.press/v128/al-baghdadi20a.html