- title: '2nd Workshop on Learning with Imbalanced Domains: Preface'
volume: 94
URL: http://proceedings.mlr.press/v94/torgo18a.html
PDF: http://proceedings.mlr.press/v94/torgo18a/torgo18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-torgo18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 1-7
id: torgo18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 1
lastpage: 7
published: 2018-11-05 00:00:00 +0000
- title: 'Learning from Positive and Unlabeled Data under the Selected At Random Assumption'
abstract: 'For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about the true distribution of the classes and/or the mechanism that was used to select the positive examples to be labeled. The commonly made assumptions, separability of the classes and positive examples being selected completely at random, are very strong. This paper proposes a weaker assumption that assumes the positive examples to be selected at random, conditioned on some of the attributes. To learn under this assumption, an EM method is proposed. Experiments show that our method is not only very capable of learning under this assumption, but it also outperforms the state of the art for learning under the selected completely at random assumption.'
volume: 94
URL: http://proceedings.mlr.press/v94/bekker18a.html
PDF: http://proceedings.mlr.press/v94/bekker18a/bekker18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-bekker18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Bekker
given: Jessa
- family: Davis
given: Jesse
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 8-22
id: bekker18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 8
lastpage: 22
published: 2018-11-05 00:00:00 +0000
- title: 'Multi-label kNN Classifier with Self Adjusting Memory for Drifting Data Streams'
abstract: 'Multi-label data streams is a highly challenging task involving drifts in features and labels. Classifiers must automatically adapt to changes while keeping a competitive accuracy in a real-time dynamic environment where the frequencies of the labelsets are non-stationary and highly imbalanced. This paper presents a multi-label k Nearest Neighbor (kNN) with Self Adjusting Memory (SAM) for drifting data streams (ML-SAM-kNN). It exploits short- and long-term memories to predict the current and evolving states of the data stream. The experimental study compares the proposal with eight other multi-label classifiers for data streams on 23 datasets on six multi-label metrics, evaluation time, and memory consumption. Non-parametric statistical analysis of the results shows the superiority of ML-SAM-kNN, including when compared with ML-kNN.'
volume: 94
URL: http://proceedings.mlr.press/v94/roseberry18a.html
PDF: http://proceedings.mlr.press/v94/roseberry18a/roseberry18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-roseberry18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Roseberry
given: Martha
- family: Cano
given: Alberto
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 23-37
id: roseberry18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 23
lastpage: 37
published: 2018-11-05 00:00:00 +0000
- title: 'Non-Linear Gradient Boosting for Class-Imbalance Learning'
abstract: 'Gradient boosting relies on linearly combining diverse and weak hypotheses to build a strong classifier. In the class imbalance setting, boosting algorithms often require many hypotheses which tend to be more complex and may increase the risk of overfitting. We propose in this paper to address this issue by adapting the gradient boosting framework to a non-linear setting. In order to learn the idiosyncrasies of the target concept and prevent the algorithm from being biased toward the majority class, we suggest to jointly learn different combinations of the same set of very weak classifiers and expand the expressiveness of the final model by leveraging their non-linear complementarity. We perform an extensive experimental study using decision trees and show that, while requiring much less weak learners with a lower complexity (fewer splits per tree), our model outperforms standard linear gradient boosting.'
volume: 94
URL: http://proceedings.mlr.press/v94/frery18a.html
PDF: http://proceedings.mlr.press/v94/frery18a/frery18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-frery18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Frery
given: Jordan
- family: Habrard
given: Amaury
- family: Sebban
given: Marc
- family: He-Guelton
given: Liyun
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 38-51
id: frery18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 38
lastpage: 51
published: 2018-11-05 00:00:00 +0000
- title: 'Proper Losses for Learning with Example-Dependent Costs'
abstract: 'We study the design of cost-sensitive learning algorithms with example-dependent costs, when cost matrices for each example are given both during training and test. The approach is based on the empirical risk minimization framework, where we replace the standard loss function by a combination of surrogate losses belonging to the family of proper losses. The actual contribution of each example to the risk is then given by a loss that depends on the cost matrix for the specific example. We then evaluate the use of such example-dependent loss functions in real-world binary and multiclass problems, namely credit risk assessment and musical genre classification. Using different neural network architectures, we show that with the appropriate choice of the example-dependent losses, we can outperform conventional cost-sensitive methods in terms of total cost, making a more efficient use of cost information during training and test as compared to existing discriminative approaches.'
volume: 94
URL: http://proceedings.mlr.press/v94/hepburn18a.html
PDF: http://proceedings.mlr.press/v94/hepburn18a/hepburn18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-hepburn18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Hepburn
given: Alexander
- family: McConville
given: Ryan
- family: Santos-Rodríguezo
given: Raúl
- family: Cid-Sueiro
given: Jesús
- family: García-García
given: Dario
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 52-66
id: hepburn18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 52
lastpage: 66
published: 2018-11-05 00:00:00 +0000
- title: 'REBAGG: REsampled BAGGing for Imbalanced Regression'
abstract: 'The problem of imbalanced domains is important in multiple real world applications. This problem has been thoroughly studied for classification tasks. In particular, the adaptation of ensembles to tackle imbalanced domains has shown important advantages in a classification context. Still, for imbalanced regression problems only a few solutions exist. Moreover, the capabilities of ensembles for dealing with imbalanced regression tasks is yet to be explored. In this paper we present the REsampled BAGGing (REBAGG) algorithm, a bagging-based ensemble method that incorporates data pre-processing strategies for addressing imbalanced domains in regression tasks. The extensive experimental evaluation conducted shows the advantage of our proposal in a diverse set of domains and learning algorithms.'
volume: 94
URL: http://proceedings.mlr.press/v94/branco18a.html
PDF: http://proceedings.mlr.press/v94/branco18a/branco18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-branco18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Branco
given: Paula
- family: Torgo
given: Luis
- family: Ribeiro
given: Rita P.
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 67-81
id: branco18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 67
lastpage: 81
published: 2018-11-05 00:00:00 +0000
- title: 'Undersampled Majority Class Ensemble for highly imbalanced binary classification'
abstract: 'Following work tries to utilize an ensemble approach to solve a problem of highly imbalanced data classification. Paper contains a proposition of umce – a multiple classifier system, based on k-fold division of the majority class to create a pool of classifiers breaking one imbalanced problem into many balanced ones while ensuring the presence of all available samples in the training procedure. Algorithm, with five proposed fusers and a pruning method based on the statistical dependencies of the classifiers response on the testing set, was evaluated on the basis of the computer experiments carried out on the benchmark datasets and two different base classifiers.'
volume: 94
URL: http://proceedings.mlr.press/v94/ksieniewicz18a.html
PDF: http://proceedings.mlr.press/v94/ksieniewicz18a/ksieniewicz18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-ksieniewicz18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Ksieniewicz
given: Pawel
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 82-94
id: ksieniewicz18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 82
lastpage: 94
published: 2018-11-05 00:00:00 +0000
- title: 'ImWeights: Classifying Imbalanced Data Using Local and Neighborhood Information'
abstract: 'Preprocessing methods for imbalanced data transform the training data to a form more suitable for learning classifiers. Most of these methods either focus on local relationships between single training examples or analyze the global characteristics of the data, such as the class imbalance ratio in the dataset. However, they do not sufficiently exploit the combination of both these views. In this paper, we put forward a new data preprocessing method called ImWeights, which weights training examples according to their local difficulty (safety) and the vicinity of larger minority clusters (gravity). Experiments with real-world datasets show that ImWeights is on par with local and global preprocessing methods, while being the least memory intensive. The introduced notion of minority cluster gravity opens new lines of research for specialized preprocessing methods and classifier modifications for imbalanced data.'
volume: 94
URL: http://proceedings.mlr.press/v94/lango18a.html
PDF: http://proceedings.mlr.press/v94/lango18a/lango18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-lango18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Lango
given: Mateusz
- family: Brzezinski
given: Dariusz
- family: Stefanowski
given: Jerzy
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 95-109
id: lango18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 95
lastpage: 109
published: 2018-11-05 00:00:00 +0000
- title: 'On the Need of Class Ratio Insensitive Drift Tests for Data Streams'
abstract: 'Early approaches to detect concept drifts in data streams without actual class labels aim at minimizing external labeling costs. However, their functionality is dubious when presented with changes in the proportion of the classes over time, as such methods keep reporting concept drifts that would not damage the performance of the current classification model. In this paper, we present an approach that can detect changes in the distribution of the features that is insensitive to changes in the distribution of the classes. The method also provides an estimate of the current class ratio and use it to adapt the threshold of a classification model trained with a balanced data. We show that the classification performance achieved by such a modified classifier is greater than that of a classifier trained with the same class distribution as the current imbalanced data.'
volume: 94
URL: http://proceedings.mlr.press/v94/maletzke18a.html
PDF: http://proceedings.mlr.press/v94/maletzke18a/maletzke18a.pdf
edit: https://github.com/mlresearch/v94/edit/gh-pages/_posts/2018-11-05-maletzke18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications'
publisher: 'PMLR'
author:
- family: Maletzke
given: André
- family: Reis
given: Denis
- family: Cherman
given: Everton
- family: Batista
given: Gustavo
editor:
- family: Torgo
given: Luís
- family: Matwin
given: Stan
- family: Japkowicz
given: Nathalie
- family: Krawczyk
given: Bartosz
- family: Moniz
given: Nuno
- family: Branco
given: Paula
page: 110-124
id: maletzke18a
issued:
date-parts:
- 2018
- 11
- 5
firstpage: 110
lastpage: 124
published: 2018-11-05 00:00:00 +0000