Proceedings of Machine Learning ResearchProceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications on 13-16 June 2017
Published as Volume 60 by the Proceedings of Machine Learning Research on 31 May 2017.
Volume Edited by:
Alex Gammerman
Vladimir Vovk
Zhiyuan Luo
Harris Papadopoulos
Series Editors:
Neil D. Lawrence
Mark Reid
http://proceedings.mlr.press/v60/
Fri, 02 Jun 2017 04:55:17 +0000Fri, 02 Jun 2017 04:55:17 +0000Jekyll v3.4.3Asymptotic Properties of Nonparametric Estimation on ManifoldIn many applications, the real high-dimensional data occupy only a very small part in the high dimensional ‘observation space’
whose intrinsic dimension is small.
The most popular model of such data is Manifold model which assumes that the data lie on or near an unknown manifold (Data Manifold, DM) of lower dimensionality
embedded in an ambient high-dimensional input space (Manifold Assumption about high-dimensional data).
Manifold Learning is a Dimensionality Reduction problem under the Manifold assumption about the processed data,
and its goal is to construct a low-dimensional parameterization of the DM (global low-dimensional coordinates on the DM)
from a finite dataset sampled from the DM.
Manifold Assumption means that local neighborhood of each manifold point is equivalent to an area of low-dimensional Euclidean space.
Because of this, most of Manifold Learning algorithms include two parts:
‘local part’ in which certain characteristics reflecting low-dimensional local structure of neighborhoods of all sample points
are constructed via nonparametric estimation,
and ‘global part’ in which global low-dimensional coordinates on the DM are constructed
by solving the certain convex optimization problem for specific cost function depending on the local characteristics.
Both statistical properties of ‘local part’ and its average over manifold are considered in the paper.
The article is an extension of the paper (Yanovich, 2016) for the case of nonparametric estimation.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/yanovich17a.html
http://proceedings.mlr.press/v60/yanovich17a.htmlCP-RA$k$EL: Improving Random $k$-labelsets with Conformal Prediction for Multi-label ClassificationMulti-label conformal prediction has attracted much attention in the conformal predictor (CP) community.
In this article, we propose to combine CP with random $k$-labelsets (RA$k$EL) method,
which is state-of-the-art multi-label classification method for large number of labels.
In the framework of RA$k$EL, the original problem is reduced to a number of small-sized multi-label classification tasks
by randomly breaking the initial set of labels into a number of small-sized labelsets,
and then label powerset (LP) method is employed on these tasks respectively.
In this work, ICP-RF, an inductive conformal predictor based on random forest,
is used in each multi-label task in order to get p-values for predictions of the LP model,
and then the predictions are aggregated to get a final result.
Experimental results on six benchmark datasets empirically demonstrate the calibration property of ICP-RF as LP models,
and show that conformal prediction can significantly improve the performances of the proposed approach, which is called RA$k$EL.
However, the validity property of CP does not hold in CP-RA$k$EL.
In the future work we will study how to use some new CP techniques to calibrate the new method.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/yang17a.html
http://proceedings.mlr.press/v60/yang17a.htmlOnline Aggregation of Unbounded Signed Losses Using Shifting ExpertsFor the decision theoretic online (DTOL) setting,
we consider methods to construct algorithms that suffer loss not much more than of any sequence of experts
distributed along a time interval (shifting experts setting).
We present a modified version of the method of Mixing Past Posteriors
which uses as basic algorithm AdaHedge with adaptive learning rate.
Due to this, we combine the advantages of both algorithms:
regret bounds are valid in the case of signed unbounded losses of the experts,
also, we use the shifting regret which is a more optimal characteristic of the algorithm.
All results are obtained in the adversarial setting—no assumptions are made about the nature of data source.
We present results of numerical experiments for the case where losses of the experts cannot be bounded in advance.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/v-yugin17a.html
http://proceedings.mlr.press/v60/v-yugin17a.htmlNonparametric predictive distributions based on conformal predictionThis paper applies conformal prediction to derive predictive distributions that are valid under a nonparametric assumption.
Namely, we introduce and explore predictive distribution functions that always satisfy a natural property of validity
in terms of guaranteed coverage for IID observations.
The focus is on a prediction algorithm that we call the Least Squares Prediction Machine (LSPM).
The LSPM generalizes the classical Dempster–Hill predictive distributions to regression problems.
If the standard parametric assumptions for Least Squares linear regression hold,
the LSPM is as efficient as the Dempster–Hill procedure, in a natural sense.
And if those parametric assumptions fail, the LSPM is still valid, provided the observations are IID.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/vovk17a.html
http://proceedings.mlr.press/v60/vovk17a.htmlInductive Conformal Martingales for Change-Point DetectionWe consider the problem of quickest change-point detection in data streams.
Classical change-point detection procedures, such as CUSUM, Shiryaev-Roberts and Posterior Probability statistics,
are optimal only if the change-point model is known, which is an unrealistic assumption in typical applied problems.
Instead we propose a new method for change-point detection based on Inductive Conformal Martingales,
which requires only the independence and identical distribution of observations.
We compare the proposed approach to standard methods,
as well as to change-point detection oracles, which model a typical practical situation
when we have only imprecise (albeit parametric) information about pre- and post-change data distributions.
Results of comparison provide evidence that change-point detection based on Inductive Conformal Martingales is an efficient tool,
capable to work under quite general conditions unlike traditional approaches.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/volkhonskiy17a.html
http://proceedings.mlr.press/v60/volkhonskiy17a.htmlCombination of Conformal Predictors for ClassificationThe paper presents some possible approaches to the combination of Conformal Predictors in the binary classification case.
A first class of methods is based on p-value combination techniques that have been proposed in the context of Statistical Hypothesis Testing;
a second class is based on the calibration of p-values into Bayes factors. A few methods from these two classes are applied to a real-world case,
namely the chemoinformatics problem of Compound Activity Prediction.
Their performance is discussed, showing the different abilities to preserve of validity and improve efficiency.
The experiments show that P-value combination, in particular Fisher’s method, can be advantageous when ranking compounds by strength of evidence.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/toccaceli17a.html
http://proceedings.mlr.press/v60/toccaceli17a.htmlImproving Reliable Probabilistic Prediction by Using Additional KnowledgeVenn Machine is a recently developed machine learning framework for reliable probabilistic prediction of the labels for new examples.
This work proposes a way to extend Venn machine to the framework known as Learning Under Privileged Information:
some additional features are available for a part of the training set, and are missing for the example being predicted.
We make use of this information by making a taxonomy transfer, where taxonomy is the core detail of Venn Machine framework.
The transfer is done from the examples with additional information to the examples without additional information.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/nouretdinov17b.html
http://proceedings.mlr.press/v60/nouretdinov17b.htmlReverse Conformal Approach for On-line Experimental DesignConformal prediction is a recently developed framework of confident machine learning with guaranteed validity properties for prediction sets.
In this work we study its usage in reversed version of the traditional machine learning problem:
prediction of objects which can have a given label, instead of usual prediction of labels by objects.
It is meant that the label reflect some desired property of the object.
For this kind of task, the conformal prediction framework can provide a prediction set that is a set of objects that are likely to have the label.
Based on this, we create an on-line protocol of experimental design.
It includes a choice criterion based on conformal output, and elements of transfer learning in order to keep the validity properties in on-line regime.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/nouretdinov17a.html
http://proceedings.mlr.press/v60/nouretdinov17a.htmlMaximizing Gain in HTS Screening Using Conformal PredictionToday, screening of large compound collections in high throughput screening campaigns form the backbone of early drug discovery.
Although widely applied, this approach is resource and potentially labour intensive.
Therefore, improved computational approaches to streamline screening is in high demand.
In this study we introduce conformal prediction paired with a gain-cost function to make predictions
in order to maximise the gain of screening campaigns on new screening sets.
Our results indicate that using 20\% of the screening library as an initial screening set
and using the data obtained together with a gain-cost function,
the significance level of the predictor that maximise the gain can be identified.
Importantly, the parameters for the predictor derived from the initial screening set was highly predictive of the maximal gain also on the remaining data.
Using this approach, the gain of a screening campaign can be improved considerably.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/norinder17a.html
http://proceedings.mlr.press/v60/norinder17a.htmlMulti-class probabilistic classification using inductive and cross Venn–Abers predictorsInductive (IVAP) and cross (CVAP) Venn–Abers predictors are computationally efficient algorithms for probabilistic prediction in binary classification problems.
We present a new approach to multi-class probability estimation by turning IVAPs and CVAPs into multi-class probabilistic predictors.
The proposed multi-class predictors are experimentally more accurate than both uncalibrated predictors and existing calibration methods.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/manokhin17a.html
http://proceedings.mlr.press/v60/manokhin17a.htmlSCOT Approximation, Training and Asymptotic InferenceApproximation of stationary strongly mixing processes by Stochastic Context Trees (SCOT) models
and the Le Cam-Hajek-Ibragimov-Khasminsky locally minimax theory of statistical inference for them is outlined.
SCOT is an $m$-Markov model with sparse memory structure.
In our previous papers we proved SCOT equivalence to 1-MC with state space—alphabet consisting of the SCOT contexts.
For the fixed alphabet size and growing sample size, the Local Asymptotic Normality is proved and applied for establishing asymptotically optimal inference.
We outline what obstacles arise for a large SCOT alphabet size and not necessarily vast sample size.
Training SCOT on a large string using clusters of computers and statistical applications are described.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/malyutov17a.html
http://proceedings.mlr.press/v60/malyutov17a.htmlOn the Calibration of Aggregated Conformal PredictorsConformal prediction is a learning framework that produces models that associate with each of their predictions a measure of statistically valid confidence.
These models are typically constructed on top of traditional machine learning algorithms.
An important result of conformal prediction theory is that the models produced are provably valid under relatively weak assumptions—in particular,
their validity is independent of the specific underlying learning algorithm on which they are based.
Since validity is automatic, much research on conformal predictors has been focused on improving their informational and computational efficiency.
As part of the efforts in constructing efficient conformal predictors, aggregated conformal predictors were developed,
drawing inspiration from the field of classification and regression ensembles.
Unlike early definitions of conformal prediction procedures, the validity of aggregated conformal predictors is not fully understood—while it has been shown
that they might attain empirical exact validity under certain circumstances,
their theoretical validity is conditional on additional assumptions that require further clarification.
In this paper, we show why validity is not automatic for aggregated conformal predictors,
and provide a revised definition of aggregated conformal predictors that gains approximate validity
conditional on properties of the underlying learning algorithm.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/linusson17a.html
http://proceedings.mlr.press/v60/linusson17a.htmlConformal $k$-NN Anomaly Detector for Univariate Data StreamsAnomalies in time-series data give essential and often actionable information in many applications.
In this paper we consider a model-free anomaly detection method for univariate time-series
which adapts to non-stationarity in the data stream and provides probabilistic abnormality scores based on the conformal prediction paradigm.
Despite its simplicity the method performs on par with complex prediction-based models
on the Numenta Anomaly Detection benchmark and the Yahoo! S5 dataset.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/ishimtsev17a.html
http://proceedings.mlr.press/v60/ishimtsev17a.htmlPrefaceWed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/gammerman17a.html
http://proceedings.mlr.press/v60/gammerman17a.htmlConformal Prediction for Automatic Face RecognitionAutomatic Face Recognition (AFR) has been the subject of many research studies in the past two decades and has a wide range of applications.
The provision of some kind of indication of the likelihood of a recognition being correct is a desirable property of AFR techniques in many applications,
such as for the detection of wanted persons or for performing post-processing in automatic annotation of photographs.
This paper investigates the use of the Conformal Prediction (CP) framework for providing reliable confidence information for AFR.
In particular we combine CP with two classifiers based on calculating similarities between images using Scale Invariant Feature Transformation (SIFT) features.
We examine and compare the performance of several nonconformity measures for the particular task in terms of their accuracy and informational eficiency.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/eliades17a.html
http://proceedings.mlr.press/v60/eliades17a.htmlComparing Performance of Different Inductive and Transductive Conformal Predictors Relevant to Drug DiscoveryWe present an evaluation of the impact of transductive, inductive, aggregated and cross inductive mondrian conformal prediction
on the validity and efficiency of predictions.
The aim of the study is to give guidance to which methods perform best where there is limited data.
The evaluation has been made on a large public dataset of Ames mutagenicity data, relevant for drug discovery,
a spam dataset and a diverse set of drug discovery datasets.
When considering predictions only, the transductive conformal predictor performs the best in terms of validity.
If however more information is required, for example interpretation of a prediction,
then any of the methods that calculate an averaged p-value should be considered.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/carlsson17a.html
http://proceedings.mlr.press/v60/carlsson17a.htmlPrediction of Metabolic Transformations using Cross Venn-ABERS PredictorsPrediction of drug metabolism is an important topic in the drug discovery process,
and we here present a study using probabilistic predictions applying Cross Venn-ABERS Predictors (CVAPs) on data for site-of-metabolism.
We used a dataset of 73599 biotransformations, applied SMIRKS to define biotransformations of interest and constructed five datasets
where chemical structures were represented using signatures descriptors.
The results show that CVAP produces well-calibrated predictions for all datasets with good predictive capability,
making CVAP an interesting method for further exploration in drug discovery applications.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/arvidsson17a.html
http://proceedings.mlr.press/v60/arvidsson17a.htmlUsing Conformal Prediction to Prioritize Compound Synthesis in Drug DiscoveryThe choice of how much money and resources to spend to understand certain problems is of high interest in many areas.
This work illustrates how computational models can be more tightly coupled with experiments
to generate decision data at lower cost without reducing the quality of the decision.
Several different strategies are explored to illustrate the trade off between lowering costs and quality in decisions.
AUC is used as a performance metric and the number of objects that can be learnt from is constrained.
Some of the strategies described reach AUC values over 0.9 and outperforms strategies that are more random.
The strategies that use conformal predictor p-values show varying results, although some are top performing.Wed, 31 May 2017 00:00:00 +0000
http://proceedings.mlr.press/v60/ahlberg17a.html
http://proceedings.mlr.press/v60/ahlberg17a.html