- title: 'Learning with Imbalanced Domains: Preface' volume: 74 URL: https://proceedings.mlr.press/v74/torgo17a.html PDF: http://proceedings.mlr.press/v74/torgo17a/torgo17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-torgo17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 1-6 id: torgo17a issued: date-parts: - 2017 - 10 - 11 firstpage: 1 lastpage: 6 published: 2017-10-11 00:00:00 +0000 - title: 'Influence of minority class instance types on SMOTE imbalanced data oversampling' abstract: 'Despite more than two decades of intense research, learning from imbalanced data still remains as one of the major difficulties posed for computational intelligence systems. Among plethora of techniques dedicated to alleviating this problem, preprocessing algorithms are considered among the most efficient ones. They aim at re-balancing the training set by either undersampling of the majority class, or oversampling of the minority one. Here, Synthetic Minority Oversampling Technique, commonly known as SMOTE, stands as the most popular solution that introduces artificial instances on the basis of minority class neighborhood distribution. However, many recent works point out to the fact that the imbalanced ratio itself is not the sole source of learning difficulties in such scenarios. One should take a deeper look into the minority class structure in order to identify which instances influence the performance of classifiers in most significant manner. In this paper, we propose to investigate the role of minority class instance types on the performance of SMOTE. To achieve this, instead of oversampling uniformly the minority class, we preprocess only selected subsets of instances, based on their individual difficulties. Experimental study proves that such a selective oversampling leads to improved classification performance.' volume: 74 URL: https://proceedings.mlr.press/v74/skryjomski17a.html PDF: http://proceedings.mlr.press/v74/skryjomski17a/skryjomski17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-skryjomski17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Przemysław family: Skryjomski - given: Bartosz family: Krawczyk editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 7-21 id: skryjomski17a issued: date-parts: - 2017 - 10 - 11 firstpage: 7 lastpage: 21 published: 2017-10-11 00:00:00 +0000 - title: 'A Network Perspective on Stratification of Multi-Label Data' abstract: 'We present a new approach to stratifying multi-label data for classification purposes based on the iterative stratification approach proposed by Sechidis et. al. in an ECML PKDD 2011 paper. Our method extends the iterative approach to take into account second-order relationships between labels. Obtained results are evaluated using statistical properties of obtained strata as presented by Sechidis. We also propose new statistical measures relevant to second-order quality: label pairs distribution, the percentage of label pairs without positive evidence in folds and label pair - fold pairs that have no positive evidence for the label pair. We verify the impact of new methods on classification performance of Binary Relevance, Label Powerset and a fast greedy community detection based label space partitioning classifier. The proposed approach lowers the variance of classification quality, improves label pair oriented measures and example distribution while maintaining a competitive quality in label-oriented measures. We also witness an increase in stability of network characteristics.' volume: 74 URL: https://proceedings.mlr.press/v74/szyma%C5%84ski17a.html PDF: http://proceedings.mlr.press/v74/szymański17a/szymański17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-szymański17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Piotr family: Szymański - given: Tomasz family: Kajdanowicz editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 22-35 id: szymański17a issued: date-parts: - 2017 - 10 - 11 firstpage: 22 lastpage: 35 published: 2017-10-11 00:00:00 +0000 - title: 'SMOGN: a Pre-processing Approach for Imbalanced Regression' abstract: 'The problem of imbalanced domains, framed within predictive tasks, is relevant in many practical applications. When dealing with imbalanced domains a performance degradation is usually observed on the most rare and relevant cases for the user. This problem has been thoroughly studied within a classification setting where the target variable is nominal. The exploration of this problem in other contexts is more recent within the research community. For regression tasks, where the target variable is continuous, only a few solutions exist. Pre-processing strategies are among the most successful proposals for tackling this problem. In this paper we propose a new pre-processing approach for dealing with imbalanced regression. Our algorithm, SMOGN, incorporates two existing proposals trying to solve problems detected in both of them. We show that SMOGN has advantages in comparison to other approaches. We also show that our method has a different impact on the learners used, displaying more advantages for Random Forest and Multivariate Adaptive Regression Splines learners.' volume: 74 URL: https://proceedings.mlr.press/v74/branco17a.html PDF: http://proceedings.mlr.press/v74/branco17a/branco17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-branco17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Paula family: Branco - given: Luís family: Torgo - given: Rita P. family: Ribeiro editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 36-50 id: branco17a issued: date-parts: - 2017 - 10 - 11 firstpage: 36 lastpage: 50 published: 2017-10-11 00:00:00 +0000 - title: 'Stacked-MLkNN: A stacking based improvement to Multi-Label k-Nearest Neighbours' abstract: 'Multi-label classification deals with problems where each datapoint can be assigned to more than one class, or label, at the same time. The simplest approach for such problems is to train independent binary classification models for each label and use these models to independently predict a set of relevant labels for a datapoint. MLkNN is an instance-based lazy learning algorithm for multi-label classification that takes this approach. MLkNN, and similar algorithms, however, do not exploit associations which may exist between the set of potential labels. These methods also suffer from imbalance in the frequency of labels in a training dataset. This work attempts to improve the predictions of MLkNN by implementing a two-layer stack-like method, Stacked-MLkNN which exploits the label associations. Experiments show that Stacked-MLkNN produces better predictions than MLkNN and several other state-of-the-art instance-based learning algorithms.' volume: 74 URL: https://proceedings.mlr.press/v74/pakrashi17a.html PDF: http://proceedings.mlr.press/v74/pakrashi17a/pakrashi17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-pakrashi17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Arjun family: Pakrashi - given: Brian family: Mac Namee editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 51-63 id: pakrashi17a issued: date-parts: - 2017 - 10 - 11 firstpage: 51 lastpage: 63 published: 2017-10-11 00:00:00 +0000 - title: 'Sampling a Longer Life: Binary versus One-class classification Revisited ' abstract: 'When faced with imbalanced domains, practitioners have one of two choices; if the imbalance is manageable, sampling or other corrective measures can be utilized in conjunction with binary classifiers (BCs). Beyond a certain point, however, the imbalance becomes too extreme and one-class classifiers (OCCs) are required. Whilst the literature offers many advances in terms of algorithms and understanding, there remains a need to connect our theoretical advances to the most practical of decisions. Specifically, given a dataset with some level of complexity and imbalance, which classification approach should be applied? In this paper, we establish a relationship between these facets in order to help guide the decision regarding when to apply OCC versus BC. Our results show that sampling provides an edge over OCCs on complex domains. Alternatively, OCCs are a good choice on less complex domains that exhibit unimodal properties. Class overlap, on the other hand, has a more uniform impact across all methods.' volume: 74 URL: https://proceedings.mlr.press/v74/bellinger17a.html PDF: http://proceedings.mlr.press/v74/bellinger17a/bellinger17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-bellinger17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Colin family: Bellinger - given: Shiven family: Sharma - given: Osmar R. family: Zaı̈ane - given: Nathalie family: Japkowicz editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 64-78 id: bellinger17a issued: date-parts: - 2017 - 10 - 11 firstpage: 64 lastpage: 78 published: 2017-10-11 00:00:00 +0000 - title: 'Improving Resampling-based Ensemble in Churn Prediction' abstract: 'Dealing with class imbalance is a challenging issue in churn prediction. Although resampling-based ensemble solutions have demonstrated their superiority in many fields, previous research shows that they cannot improve the profit-based measure in churn prediction. In this paper, we explore the impact of the class ratio in the training subsets on the predictive performance of resampling-based ensemble techniques based on experiments on real-world churn prediction data sets. The experimental results show that the setting of the class ratio has a great impact on the model performance. It is also found that by choosing suitable class ratios in the training subsets, UnderBagging and Balanced Random Forests can significantly improve profits brought by the churn prediction model. The demonstrated results provide some guidelines for both academic and industrial practitioners.' volume: 74 URL: https://proceedings.mlr.press/v74/zhu17a.html PDF: http://proceedings.mlr.press/v74/zhu17a/zhu17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-zhu17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Bing family: Zhu - given: Seppe family: Broucke - given: Bart family: Baesens - given: Sebastián family: Maldonado editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 79-91 id: zhu17a issued: date-parts: - 2017 - 10 - 11 firstpage: 79 lastpage: 91 published: 2017-10-11 00:00:00 +0000 - title: 'Predicting Defective Engines using Convolutional Neural Networks on Temporal Vibration Signals' abstract: 'This paper addresses for the first time the problem of engines’ damage prediction using huge amounts of imbalanced data from "structure borne noise" signals related to the internal engine excitation. We propose the usage of a convolutional neural network on our temporal input signals, subsequently combined with additional static features. Using informative mini batches during training we take the imbalance of the data into account. The experimental results indicate good performance in detecting the minority class on our large real-world use case.' volume: 74 URL: https://proceedings.mlr.press/v74/g%C3%BCnnemann17a.html PDF: http://proceedings.mlr.press/v74/günnemann17a/günnemann17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-günnemann17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Nikou family: Günnemann - given: Jürgen family: Pfeffer editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 92-102 id: günnemann17a issued: date-parts: - 2017 - 10 - 11 firstpage: 92 lastpage: 102 published: 2017-10-11 00:00:00 +0000 - title: 'Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies' abstract: 'Domain adaptation is the task of transforming a model trained using data from a source domain to a different target domain. In Unsupervised Domain Adaptation (UDA), we do not assume any labelled training data from the target domain. In this paper, we consider the problem of UDA in the contact of Part-of-Speech (POS). Specifically, we study the effect of data imbalance on UDA of POS, and compare different pivot selection strategies for accurately adapting a POS tagger trained using some source domain data to a target domain. We propose the use of F-score to select pivots using available labelled data in the source domain. Our experimental results on using benchmark dataset for cross-domain POS tagging, show that using frequency combined with F-scores for selecting pivots in the source labelled data produces the best results.' volume: 74 URL: https://proceedings.mlr.press/v74/cui17a.html PDF: http://proceedings.mlr.press/v74/cui17a/cui17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-cui17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Xia family: Cui - given: Frans family: Coenen - given: Danushka family: Bollegala editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 103-115 id: cui17a issued: date-parts: - 2017 - 10 - 11 firstpage: 103 lastpage: 115 published: 2017-10-11 00:00:00 +0000 - title: 'Tunable Plug-In Rules with Reduced Posterior Certainty Loss in Imbalanced Datasets' abstract: 'Classifiers have difficulty recognizing under-represented minorities in imbalanced datasets, due to their focus on minimizing the overall misclassification error. This introduces predictive biases against minority classes. Post-processing plug-in rules are popular for tackling class imbalance, but they often affect the certainty of base classifier posteriors, when the latter already perform correct classification. This shortcoming makes them ill-suited to scoring tasks, where informative posterior scores are required for human interpretation. To this end, we propose the $ILoss$ metric to measure the impact of imbalance-aware classifiers on the certainty of posterior distributions. We then generalize post-processing plug-in rules in an easily tunable framework and theoretically show that this framework tends to improve performance balance. Finally, we experimentally assert that appropriate usage of our framework can reduce $ILoss$ while yielding similar performance, with respect to common imbalance-aware measures, to existing plug-in rules for binary problems.' volume: 74 URL: https://proceedings.mlr.press/v74/krasanakis17a.html PDF: http://proceedings.mlr.press/v74/krasanakis17a/krasanakis17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-krasanakis17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Emmanouil family: Krasanakis - given: Eleftherios family: Spyromitros-Xioufis - given: Symeon family: Papadopoulos - given: Yiannis family: Kompatsiaris editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 116-128 id: krasanakis17a issued: date-parts: - 2017 - 10 - 11 firstpage: 116 lastpage: 128 published: 2017-10-11 00:00:00 +0000 - title: 'Evaluation of Ensemble Methods in Imbalanced Regression Tasks' abstract: 'Ensemble methods are well known for providing an advantage over single models in a large range of data mining and machine learning tasks. Their benefits are commonly associated to the ability of reducing the bias and/or variance in learning tasks. Ensembles have been studied both for classification and regression tasks with uniform domain preferences. However, only for imbalanced classification these methods were thoroughly studied. In this paper we present an empirical study concerning the predictive ability of ensemble methods bagging and boosting in regression tasks, using 20 data sets with imbalanced distributions, and assuming non-uniform domain preferences. Results show that ensemble methods are capable of providing improvements in predictive ability towards under-represented values, and that this improvement influences the predictive ability of models concerning the average behaviour of the data. Results also show that the smaller data sets are prone to larger improvements in predictive accuracy and that no conclusion could be drawn when considering the percentage of rare cases alone.' volume: 74 URL: https://proceedings.mlr.press/v74/moniz17a.html PDF: http://proceedings.mlr.press/v74/moniz17a/moniz17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-moniz17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Nuno family: Moniz - given: Paula family: Branco - given: Luís family: Torgo editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 129-140 id: moniz17a issued: date-parts: - 2017 - 10 - 11 firstpage: 129 lastpage: 140 published: 2017-10-11 00:00:00 +0000 - title: 'Controlling Imbalanced Error in Deep Learning with the Log Bilinear Loss' abstract: 'Deep learning has become the method of choice for many machine learning tasks in recent years, and especially for multi-class classification. The most common loss function used in this context is the cross-entropy loss. While this function is insensitive to the identity of the assigned class in the case of misclassification, in practice it very common to have imbalanced sensitivity to error, meaning some wrong assignments are much worse than others. Here we present the bilinear-loss (and related log-bilinear-loss) which differentially penalizes the different wrong assignments of the model. We thoroughly test the proposed method using standard models and benchmark image datasets.' volume: 74 URL: https://proceedings.mlr.press/v74/resheff17a.html PDF: http://proceedings.mlr.press/v74/resheff17a/resheff17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-resheff17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Yehezkel S. family: Resheff - given: Amit family: Mandelbom - given: Daphna family: Weinshall editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 141-151 id: resheff17a issued: date-parts: - 2017 - 10 - 11 firstpage: 141 lastpage: 151 published: 2017-10-11 00:00:00 +0000 - title: 'Unsupervised Classification of Speaker Profiles as a Point Anomaly Detection Task' abstract: 'This paper presents an evaluation of three different anomaly detector methods over different feature sets. The three anomaly detectors are based respectively on Gaussian Mixture Model (GMM), One-Class SVM and isolation Forest. The considered feature sets are built from personality evaluation and audio signal. Personality evaluations are extracted from the BFI-10 Questionnaire, which allows to manually evaluate five personality traits (Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism). From the audio signal, we automatically extract a prosodic feature set, which performs well in affective computing. The different combinations of models and feature sets are evaluated on the SSPNET-Personality corpus which has already been used in several experiments, including a previous work on separating two types of personality profiles in a supervised way. In this work, we propose an evaluation of the three anomaly detectors with consideration to the features used. Results show that, regardless of the feature set, GMM based method is the most efficient one (it obtains 0.96 ROC-AUC score with the best feature set). The prosodic feature set seems to be a good compromise between performance (0.91 ROC-AUC score with GMM based method) and ease of extraction.' volume: 74 URL: https://proceedings.mlr.press/v74/fayet17a.html PDF: http://proceedings.mlr.press/v74/fayet17a/fayet17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-fayet17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Cedric family: Fayet - given: Arnaud family: Delhay - given: Damien family: Lolive - given: Pierre-François family: Marteau editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 152-163 id: fayet17a issued: date-parts: - 2017 - 10 - 11 firstpage: 152 lastpage: 163 published: 2017-10-11 00:00:00 +0000 - title: 'Dealing with the task of imbalanced, multidimensional data classification using ensembles of exposers' abstract: 'Recently, the problem of imbalanced data is the focus of intense research of machine learning community. Following work tries to utilize an approach of transforming the data space into another where classification task may become easier. Paper contains a proposition of a tool, based on a photographic metaphor to build a classifier ensemble, combined with a random subspace approach. Developed solution is insensitive to a sample size and robust to dimension increase, which allows a regularization of feature space, reducing the impact of biased classes. The proposed approach was evaluated on the basis of the computer experiments carried out on the benchmark and synthetic datasets.' volume: 74 URL: https://proceedings.mlr.press/v74/ksieniewicz17a.html PDF: http://proceedings.mlr.press/v74/ksieniewicz17a/ksieniewicz17a.pdf edit: https://github.com/mlresearch//v74/edit/gh-pages/_posts/2017-10-11-ksieniewicz17a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications' publisher: 'PMLR' author: - given: Paweł family: Ksieniewicz - given: Michał family: Woźniak editor: - given: Luís family: Torgo - given: Bartosz family: Krawczyk - given: Paula family: Branco - given: Nuno family: Moniz page: 164-175 id: ksieniewicz17a issued: date-parts: - 2017 - 10 - 11 firstpage: 164 lastpage: 175 published: 2017-10-11 00:00:00 +0000