Proceedings of Machine Learning Research

Proceedings of Machine Learning Research Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications Held in ECML-PKDD, Dublin, Ireland on 10 September 2018 Published as Volume 94 by the Proceedings of Machine Learning Research on 05 November 2018. Volume Edited by: Luís Torgo Stan Matwin Nathalie Japkowicz Bartosz Krawczyk Nuno Moniz Paula Branco Series Editors: Neil D. Lawrence Mark Reid https://proceedings.mlr.press/v94/ Wed, 08 Feb 2023 10:44:36 +0000 Wed, 08 Feb 2023 10:44:36 +0000 Jekyll v3.9.3 2nd Workshop on Learning with Imbalanced Domains: Preface Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/torgo18a.html https://proceedings.mlr.press/v94/torgo18a.html Multi-label kNN Classifier with Self Adjusting Memory for Drifting Data Streams Multi-label data streams is a highly challenging task involving drifts in features and labels. Classifiers must automatically adapt to changes while keeping a competitive accuracy in a real-time dynamic environment where the frequencies of the labelsets are non-stationary and highly imbalanced. This paper presents a multi-label k Nearest Neighbor (kNN) with Self Adjusting Memory (SAM) for drifting data streams (ML-SAM-kNN). It exploits short- and long-term memories to predict the current and evolving states of the data stream. The experimental study compares the proposal with eight other multi-label classifiers for data streams on 23 datasets on six multi-label metrics, evaluation time, and memory consumption. Non-parametric statistical analysis of the results shows the superiority of ML-SAM-kNN, including when compared with ML-kNN. Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/roseberry18a.html https://proceedings.mlr.press/v94/roseberry18a.html On the Need of Class Ratio Insensitive Drift Tests for Data Streams Early approaches to detect concept drifts in data streams without actual class labels aim at minimizing external labeling costs. However, their functionality is dubious when presented with changes in the proportion of the classes over time, as such methods keep reporting concept drifts that would not damage the performance of the current classification model. In this paper, we present an approach that can detect changes in the distribution of the features that is insensitive to changes in the distribution of the classes. The method also provides an estimate of the current class ratio and use it to adapt the threshold of a classification model trained with a balanced data. We show that the classification performance achieved by such a modified classifier is greater than that of a classifier trained with the same class distribution as the current imbalanced data. Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/maletzke18a.html https://proceedings.mlr.press/v94/maletzke18a.html ImWeights: Classifying Imbalanced Data Using Local and Neighborhood Information Preprocessing methods for imbalanced data transform the training data to a form more suitable for learning classifiers. Most of these methods either focus on local relationships between single training examples or analyze the global characteristics of the data, such as the class imbalance ratio in the dataset. However, they do not sufficiently exploit the combination of both these views. In this paper, we put forward a new data preprocessing method called ImWeights, which weights training examples according to their local difficulty (safety) and the vicinity of larger minority clusters (gravity). Experiments with real-world datasets show that ImWeights is on par with local and global preprocessing methods, while being the least memory intensive. The introduced notion of minority cluster gravity opens new lines of research for specialized preprocessing methods and classifier modifications for imbalanced data. Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/lango18a.html https://proceedings.mlr.press/v94/lango18a.html Undersampled Majority Class Ensemble for highly imbalanced binary classification Following work tries to utilize an ensemble approach to solve a problem of highly imbalanced data classification. Paper contains a proposition of umce – a multiple classifier system, based on k-fold division of the majority class to create a pool of classifiers breaking one imbalanced problem into many balanced ones while ensuring the presence of all available samples in the training procedure. Algorithm, with five proposed fusers and a pruning method based on the statistical dependencies of the classifiers response on the testing set, was evaluated on the basis of the computer experiments carried out on the benchmark datasets and two different base classifiers. Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/ksieniewicz18a.html https://proceedings.mlr.press/v94/ksieniewicz18a.html Proper Losses for Learning with Example-Dependent Costs We study the design of cost-sensitive learning algorithms with example-dependent costs, when cost matrices for each example are given both during training and test. The approach is based on the empirical risk minimization framework, where we replace the standard loss function by a combination of surrogate losses belonging to the family of proper losses. The actual contribution of each example to the risk is then given by a loss that depends on the cost matrix for the specific example. We then evaluate the use of such example-dependent loss functions in real-world binary and multiclass problems, namely credit risk assessment and musical genre classification. Using different neural network architectures, we show that with the appropriate choice of the example-dependent losses, we can outperform conventional cost-sensitive methods in terms of total cost, making a more efficient use of cost information during training and test as compared to existing discriminative approaches. Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/hepburn18a.html https://proceedings.mlr.press/v94/hepburn18a.html Non-Linear Gradient Boosting for Class-Imbalance Learning Gradient boosting relies on linearly combining diverse and weak hypotheses to build a strong classifier. In the class imbalance setting, boosting algorithms often require many hypotheses which tend to be more complex and may increase the risk of overfitting. We propose in this paper to address this issue by adapting the gradient boosting framework to a non-linear setting. In order to learn the idiosyncrasies of the target concept and prevent the algorithm from being biased toward the majority class, we suggest to jointly learn different combinations of the same set of very weak classifiers and expand the expressiveness of the final model by leveraging their non-linear complementarity. We perform an extensive experimental study using decision trees and show that, while requiring much less weak learners with a lower complexity (fewer splits per tree), our model outperforms standard linear gradient boosting. Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/frery18a.html https://proceedings.mlr.press/v94/frery18a.html REBAGG: REsampled BAGGing for Imbalanced Regression The problem of imbalanced domains is important in multiple real world applications. This problem has been thoroughly studied for classification tasks. In particular, the adaptation of ensembles to tackle imbalanced domains has shown important advantages in a classification context. Still, for imbalanced regression problems only a few solutions exist. Moreover, the capabilities of ensembles for dealing with imbalanced regression tasks is yet to be explored. In this paper we present the REsampled BAGGing (REBAGG) algorithm, a bagging-based ensemble method that incorporates data pre-processing strategies for addressing imbalanced domains in regression tasks. The extensive experimental evaluation conducted shows the advantage of our proposal in a diverse set of domains and learning algorithms. Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/branco18a.html https://proceedings.mlr.press/v94/branco18a.html Learning from Positive and Unlabeled Data under the Selected At Random Assumption For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about the true distribution of the classes and/or the mechanism that was used to select the positive examples to be labeled. The commonly made assumptions, separability of the classes and positive examples being selected completely at random, are very strong. This paper proposes a weaker assumption that assumes the positive examples to be selected at random, conditioned on some of the attributes. To learn under this assumption, an EM method is proposed. Experiments show that our method is not only very capable of learning under this assumption, but it also outperforms the state of the art for learning under the selected completely at random assumption. Mon, 05 Nov 2018 00:00:00 +0000 https://proceedings.mlr.press/v94/bekker18a.html https://proceedings.mlr.press/v94/bekker18a.html