Proceedings of Machine Learning ResearchProceedings of The International Workshop on Cost-Sensitive Learning
Held in SDM, San Diego, California, USA on 05 May 2018
Published as Volume 88 by the Proceedings of Machine Learning Research on 01 August 2018.
Volume Edited by:
Series Editors:
Neil D. Lawrence
Mark Reid
http://proceedings.mlr.press/v88/
Wed, 01 Aug 2018 19:46:47 +0000Wed, 01 Aug 2018 19:46:47 +0000Jekyll v3.7.3Cost-Sensitive Learning: PrefaceWed, 01 Aug 2018 00:00:00 +0000
http://proceedings.mlr.press/v88/torgo18a.html
http://proceedings.mlr.press/v88/torgo18a.htmlClassifier Performance Estimation with Unbalanced, Partially Labeled DataClass imbalance and lack of ground truth are two significant problems in modern machine learning research. These problems are especially pressing in operational contexts where the total number of data points is extremely large and the cost of obtaining labels is very high. In the face of these issues, accurate estimation of the performance of a detection or classification system is crucial to inform decisions based on the observations. This paper presents a framework for estimating performance of a binary classifier in such a context. We focus on the scenario where each set of measurements has been reduced to a score, and the operator only investigates data when the score exceeds a threshold. The operator is blind to the number of missed detections, so performance estimation targets two quantities: recall and the derivative of precision with respect to recall. Measuring with respect to error in these two metrics, simulations in this context demonstrate that labeling outliers not only outperforms random labeling, but often matches performance of an adaptive method that attempts to choose the optimal data for labeling. Application to real anomaly detection data confirms the utility of the approach, and suggests direction for future work.Wed, 01 Aug 2018 00:00:00 +0000
http://proceedings.mlr.press/v88/miller18a.html
http://proceedings.mlr.press/v88/miller18a.htmlCost-sensitive Classifier Selection when there is Additional Cost InformationMachine learning models are increasing in popularity in many domains as they are shown to be able to solve difficult problems. However, selecting a model to implement when there are various alternatives is a difficult problem. Receiver operating characteristic (ROC) curves are useful for selecting binary classification models for real world problems. However, ROC curves only consider the misclassification cost of the classifier. The total cost of a classification system includes various other types of cost including implementation, computation, and feature costs. To extend the ROC analysis to include this additional cost information, the ROC Convex Hull with Cost (ROCCHC) method is introduced. This method extends the ROC Convex Hull (ROCCH) method, which is used to select potentially optimal classifiers in the ROC space using misclassification cost, by selecting potentially optimal classifiers using this additional cost information. The ROCCHC method is tested using three binary classification data sets, each of which include real feature costs as the additional cost information. Competing classifiers are created with the CART algorithm by using each combination of features or sensors for each data set. The ROCCHC method reduces the classifier decision space to 4%, 9%, and 0.02%. These results are compared to the current ROCCH method, which misses 91%, 58%, and 6% of potentially optimal classifiers because the method does not include the additional cost information.Wed, 01 Aug 2018 00:00:00 +0000
http://proceedings.mlr.press/v88/meekins18a.html
http://proceedings.mlr.press/v88/meekins18a.htmlRecognizing Cuneiform Signs Using Graph Based MethodsThe cuneiform script constitutes one of the earliest systems of writing and is realized by wedge-shaped marks on clay tablets. A tremendous number of cuneiform tablets have already been discovered and are incrementally digitalized and made available to automated processing. As reading cuneiform script is still a manual task, we address the real-world application of recognizing cuneiform signs by two graph based methods with complementary runtime characteristics. We present a graph model for cuneiform signs together with a tailored distance measure based on the concept of the graph edit distance. We propose efficient heuristics for its computation and demonstrate its effectiveness in classification tasks experimentally. To this end, the distance measure is used to implement a nearest neighbor classifier leading to a high computational cost for the prediction phase with increasing training set size. In order to overcome this issue, we propose to use CNNs adapted to graphs as an alternative approach shifting the computational cost to the training phase. We demonstrate the practicability of both approaches in an experimental comparison regarding runtime and prediction accuracy. Although currently available annotated real-world data is still limited, we obtain a high accuracy using CNNs, in particular, when the training set is enriched by augmented examples.Wed, 01 Aug 2018 00:00:00 +0000
http://proceedings.mlr.press/v88/kriege18a.html
http://proceedings.mlr.press/v88/kriege18a.html