Proceedings of Machine Learning ResearchProceedings of the Fourth Asian Conference on Machine Learning
Held in Singapore Management University, Singapore on 04-06 November 2012
Published as Volume 25 by the Proceedings of Machine Learning Research on 17 November 2012.
Volume Edited by:
Steven C. H. Hoi
Wray Buntine
Series Editors:
Neil D. Lawrence
http://proceedings.mlr.press/v25/
Sun, 15 Jul 2018 14:36:23 +0000Sun, 15 Jul 2018 14:36:23 +0000Jekyll v3.7.3Multi-view Positive and Unlabeled LearningLearning with Positive and Unlabeled instances (PU learning) arises widely in information retrieval applications. To address the unavailability issue of negative instances, most existing PU learning approaches require to either identify a reliable set of negative instances from the unlabeled data or estimate probability densities as an intermediate step. However, inaccurate negative-instance identification or poor density estimation may severely degrade overall performance of the final predictive model. To this end, we propose a novel PU learning method based on density ratio estimation without constructing any sets of negative instances or estimating any intermediate densities. To further boost PU learning performance, we extend our proposed learning method in a multi-view manner by utilizing multiple heterogeneous sources. Extensive experimental studies demonstrate the effectiveness of our proposed methods, especially when positive labeled data are limited.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/zhou12.html
http://proceedings.mlr.press/v25/zhou12.htmlOnline Rank AggregationWe consider an online learning framework where the task is to predict a permutation which represents a ranking of n fixed objects. At each trial, the learner incurs a loss defined as Kendall tau distance between the predicted permutation and the true permutation given by the adversary. This setting is quite natural in many situations such as information retrieval and recommendation tasks. We prove a lower bound of the cumulative loss and hardness results. Then, we propose an algorithm for this problem and prove its relative loss bound which shows our algorithm is close to optimal.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/yasutake12.html
http://proceedings.mlr.press/v25/yasutake12.htmlPractical Large Scale Classification with Additive KernelsFor classification problems with millions of training examples or dimensions, accuracy, training and testing speed and memory usage are the main concerns. Recent advances have allowed linear SVM to tackle problems with moderate time and space cost, but for many tasks in computer vision, additive kernels would have higher accuracies. In this paper, we propose the PmSVM-LUT algorithm that employs Look-Up Tables to boost the training and testing speed and save memory usage of additive kernel SVM classification, in order to meet the needs of large scale problems. The PmSVM-LUT algorithm is based on PmSVM (Wu, 2012), which employed polynomial approximation for the gradient function to speedup the dual coordinate descent method. We also analyze the polynomial approximation numerically to demonstrate its validity. Empirically, our algorithm is faster than PmSVM and feature mapping in many datasets with higher classification accuracies and can save up to 60% memory usage as well.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/yang12.html
http://proceedings.mlr.press/v25/yang12.htmlMulti-objective Monte-Carlo Tree SearchConcerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/wang12b.html
http://proceedings.mlr.press/v25/wang12b.htmlSpatial Locality-Aware Sparse Coding and Dictionary LearningNonlinear encoding of SIFT features has recently shown good promise in image classification. This scheme is able to reduce the training complexity of the traditional bag-of-feature approaches while achieving better performance. As a result, it is suitable for large-scale image classification applications. However, existing nonlinear encoding methods do not explicitly consider the spatial relationship when encoding the local features, but merely leaving the spatial information used at a later stage, e.g. through the spatial pyramid matching, is largely inadequate. In this paper, we propose a joint sparse coding and dictionary learning scheme that take the spatial information into consideration in encoding. Our experiments on synthetic data and benchmark data demonstrate that the proposed scheme can learn a better dictionary and achieve higher classification accuracy.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/wang12a.html
http://proceedings.mlr.press/v25/wang12a.htmlConditional validity of inductive conformal predictorsConformal predictors are set predictors that are automatically valid in the sense of having coverage probability equal to or exceeding a given confidence level. Inductive conformal predictors are a computationally efficient version of conformal predictors satisfying the same property of validity. However, inductive conformal predictors have been only known to control unconditional coverage probability. This paper explores various versions of conditional validity and various ways to achieve them using inductive conformal predictors and their modifications.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/vovk12.html
http://proceedings.mlr.press/v25/vovk12.htmlMulti-Stage Classifier DesignIn many classification systems, sensing modalities have different acquisition costs. It is often unnecessary to use every modality to classify a majority of examples. We study a multi-stage system in a prediction time cost reduction setting, where the full data is available for training, but for a test example, measurements in a new modality can be acquired at each stage for an additional cost. We seek decision rules to reduce the average measurement acquisition cost. We formulate an empirical risk minimization problem (ERM) for a multi-stage reject classifier, wherein the stage k classifier either classifies a sample using only the measurements acquired so far or rejects it to the next stage where more attributes can be acquired for a cost. To solve the ERM problem, we factorize the cost function into classification and rejection decisions. We then transform reject decisions into a binary classification problem. We construct stage-by-stage global surrogate risk, develop an iterative algorithm in the boosting framework and present convergence results. We test our work on synthetic, medical and explosives detection datasets. Our results demonstrate that substantial cost reduction without a significant sacrifice in accuracy is achievable.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/trapeznikov12.html
http://proceedings.mlr.press/v25/trapeznikov12.htmlTwo-way Parallel Class Expression LearningIn machine learning, we often encounter datasets that can be described using simple rules and regular exception patterns describing situations where those rules do not apply. In this paper, we propose a two-way parallel class expression learning algorithm that is suitable for this kind of problem. This is a top-down refinement-based class expression learning algorithm for Description Logic (DL). It is distinguished from similar DL learning algorithms in the way it uses the concepts generated by the refinement operator. In our approach, we unify the computation of concepts describing positive and negative examples, but we maintain them separately, and combine them at the end. By doing so, we can avoid the use of negation in the refinement without any loss of generality. Evaluation shows that our approach can reduce the search space significantly, and therefore the learning time is reduced. Our implementation is based on the DL-Learner framework and we inherit the Parallel Class Expression Learning (ParCEL) algorithm design for parallelisation.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/tran12c.html
http://proceedings.mlr.press/v25/tran12c.htmlLearning From Ordered Sets and Applications in Collaborative RankingRanking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed 5 stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subsets. Inference in this combinatorial space is highly challenging: The space size approaches (N!=2)6:93145^N+1 as N approaches infinity. We propose a split-and-merge Metropolis-Hastings procedure that can explore the state-space efficiently. For discovering hidden aspects in the data, we enrich the model with latent binary variables so that the posteriors can be efficiently evaluated. Finally, we evaluate the proposed model on large-scale collaborative filtering tasks and demonstrate that it is competitive against state-of-the-art methods.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/tran12b.html
http://proceedings.mlr.press/v25/tran12b.htmlCumulative Restricted Boltzmann Machines for Ordinal Matrix Data AnalysisOrdinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. We show that our model is able to capture latent opinion profile of citizens around the world, and is competitive against state-of-art collaborative filtering techniques on large-scale public datasets. The model thus has the potential to extend application of RBMs to diverse domains such as recommendation systems, product reviews and expert assessments.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/tran12a.html
http://proceedings.mlr.press/v25/tran12a.htmlSupervised dimension reduction with topic modelsWe consider supervised dimension reduction (SDR) for problems with discrete variables. Existing methods are computationally expensive, and often do not take the local structure of data into consideration when searching for a low-dimensional space. In this paper, we propose a novel framework for SDR which is (1) general and fiexible so that it can be easily adapted to various unsupervised topic models, (2) able to inherit scalability of unsupervised topic models, and (3) can exploit well label information and local structure of data when searching for a new space. Extensive experiments with adaptations to three models demonstrate that our framework can yield scalable and qualitative methods for SDR. One of those adaptations can perform better than the state-of-the-art method for SDR while enjoying significantly faster speed.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/than12.html
http://proceedings.mlr.press/v25/than12.htmlImproved sequence classification using adaptive segmental sequence alignmentTraditional pairwise sequence alignment is based on matching individual samples from two sequences, under time monotonicity constraints. However, in some instances matching two segments of points may be preferred and can result in increased noise robustness. This paper presents an approach to segmental sequence alignment based on adaptive pairwise segmentation. We introduce a distance metric between segments based on average pairwise distances, which addresses deficiencies of prior approaches. We then present a modified pair-HMM that incorporates the proposed distance metric and use it to devise an e¡cient algorithm to jointly segment and align the two sequences. Our results demonstrate that this new measure of sequence similarity can lead to improved classification performance, while being resilient to noise, on a variety of problems, from EEG to motion sequence classification.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/shariat12.html
http://proceedings.mlr.press/v25/shariat12.htmlTopographic Analysis of Correlated ComponentsIndependent component analysis (ICA) is a method to estimate components which are as statistically independent as possible. However, in many practical applications, the estimated components are not independent. Recent variants of ICA have made use of such residual dependencies to estimate an ordering (topography) of the components. Like in ICA, the components in those variants are assumed to be uncorrelated, which might be a rather strict condition. In this paper, we address this shortcoming. We propose a generative model for the source where the components can have linear and higher order correlations, which generalizes models in use so far. Based on the model, we derive a method to estimate topographic representations. In numerical experiments on artificial data, the new method is shown to be more widely applicable than previously proposed extensions of ICA. We learn topographic representations for two kinds of real data sets: for outputs of simulated complex cells in the primary visual cortex and for text data.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/sasaki12.html
http://proceedings.mlr.press/v25/sasaki12.htmlRecovering Networks from Distance DataA fully probabilistic approach to reconstructing Gaussian graphical models from distance data is presented. The main idea is to extend the usual central Wishart model in traditional methods to using a likelihood depending only on pairwise distances, thus being independent of geometric assumptions about the underlying Euclidean space. This extension has two advantages: the model becomes invariant against potential bias terms in the measurements, and can be used in situations which on input use a kernel- or distance matrix, without requiring direct access to the underlying vectors. The latter aspect opens up a huge new application field for Gaussian graphical models, as network reconstruction is now possible from any Mercer kernel, be it on graphs, strings, probabilities or more complex objects. We combine this likelihood with a suitable prior to enable Bayesian network inference. We present an efficient MCMC sampler for this model and discuss the estimation of module networks. Experiments depict the high quality and usefulness of the inferred networks.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/prabhakaran12.html
http://proceedings.mlr.press/v25/prabhakaran12.htmlQBoost: Large Scale Classifier Training withAdiabatic Quantum OptimizationWe introduce a novel discrete optimization method for training in the context of a boosting framework for large scale binary classifiers. The motivation is to cast the training problem into the format required by existing adiabatic quantum hardware. First we provide theoretical arguments concerning the transformation of an originally continuous optimization problem into one with discrete variables of low bit depth. Next we propose QBoost as an iterative training algorithm in which a subset of weak classifiers is selected by solving a hard optimization problem in each iteration. A strong classifier is incrementally constructed by concatenating the subsets of weak classifiers. We supplement the findings with experiments on one synthetic and two natural data sets and compare against the performance of existing boosting algorithms. Finally, by conducting a quantum Monte Carlo simulation we gather evidence that adiabatic quantum optimization is able to handle the discrete optimization problems generated by QBoost.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/neven12.html
http://proceedings.mlr.press/v25/neven12.htmlStatistical Models for Exploring Individual Email Communication BehaviorAs digital communication devices play an increasingly prominent role in our daily lives, the ability to analyze and understand our communication patterns becomes more important. In this paper, we investigate a latent variable modeling approach for extracting information from individual email histories, focusing in particular on understanding how an individual communicates over time with recipients in their social network. The proposed model consists of latent groups of recipients, each of which is associated with a piecewise-constant Poisson rate over time. Inference of group memberships, temporal changepoints, and rate parameters is carried out via Markov Chain Monte Carlo (MCMC) methods. We illustrate the utility of the model by applying it to both simulated and real-world email data sets.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/navaroli12.html
http://proceedings.mlr.press/v25/navaroli12.htmlSparse Additive Matrix Factorization for Robust PCA and Its GeneralizationPrincipal component analysis (PCA) can be regarded as approximating a data matrix with a low-rank one by imposing sparsity on its singular values, and its robust variant further captures sparse noise. In this paper, we extend such sparse matrix learning methods, and propose a novel unified framework called sparse additive matrix factorization (SAMF). SAMF systematically induces various types of sparsity by the so-called model-induced regularization in the Bayesian framework. We propose an iterative algorithm called the mean update (MU) for the variational Bayesian approximation to SAMF, which gives the global optimal solution for a large subset of parameters in each step. We demonstrate the usefulness of our method on artificial data and the foreground/background video separation.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/nakajima12.html
http://proceedings.mlr.press/v25/nakajima12.htmlLearning and Model-Checking Networks of I/O AutomataWe introduce a new statistical relational learning (SRL) approach in which models for structured data, especially network data, are constructed as networks of communicating finite probabilistic automata. Leveraging existing automata learning methods from the area of grammatical inference, we can learn generic models for network entities in the form of automata templates. As is characteristic for SRL techniques, the abstraction level afforded by learning generic templates enables one to apply the learned model to new domains. A main benefit of learning models based on finite automata lies in the fact that one can analyse the resulting models using formal model-checking techniques, which adds a dimension of model analysis not usually available for traditional SRL modeling frameworks.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/mao12.html
http://proceedings.mlr.press/v25/mao12.htmlOn Using Nearly-Independent Feature Families for High Precision and ConfidenceOften we require classification at a very high precision level, such as 99%. We report that when very different sources of evidence such as text, audio, and video features are available, combining the outputs of base classifiers trained on each feature type separately, aka late fusion, can substantially increase the recall of the combination at high precisions, compared to the performance of a single classifier trained on all the feature types i.e., early fusion, or compared to the individual base classifiers. We show how the probability of a joint false-positive mistake can be upper bounded by the product of individual probabilities of conditional false-positive mistakes, by identifying a simple key criterion that needs to hold. This provides an explanation for the high precision phenomenon, and motivates referring to such feature families as (nearly) independent. We assess the relevant factors for achieving high precision empirically, and explore combination techniques informed by the analysis. We compare a number of early and late fusion methods, and observe that classifier combination via late fusion can more than double the recall at high precision.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/madani12.html
http://proceedings.mlr.press/v25/madani12.htmlKey Instance Detection in Multi-Instance LearningThe goal of traditional multi-instance learning (MIL) is to predict the labels of the bags, whereas in many real applications, it is desirable to get the instance labels, especially the labels of key instances that trigger the bag labels, in addition to getting bag labels. Such a problem has been largely unexplored before. In this paper, we formulate the Key Instance Detection (KID) problem, and propose a voting framework (VF) solution to KID. The key of VF is to exploit the relationship among instances, represented by a citer kNN graph. This graph is different from commonly used nearest neighbor graphs, but is suitable for KID. Experiments validate the effectiveness of VF for KID. Additionally, VF also outperforms state-of-the-art MIL approaches on the performance of bag label prediction.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/liu12b.html
http://proceedings.mlr.press/v25/liu12b.htmlA Convex-Concave Relaxation Procedure Based Subgraph Matching AlgorithmBased on the convex-concave relaxation procedure (CCRP), the (extended) path following algorithms were recently proposed to approximately solve the equal-sized graph matching problem, and exhibited a state-of-the-art performance (Zaslavskiy et al., 2009; Liu et al., 2012). However, they cannot be used for subgraph matching since either their convex or concave relaxation becomes no longer applicable. In this paper we extend the CCRP to tackle subgraph matching, by proposing a convex as well as a concave relaxation of the problem. Since in the context of CCRP, the convex relaxation can be viewed as an initialization of a concave programming, we introduce two other initializations for comparison. Meanwhile, the graduated assignment algorithm is also introduced in the experimental comparisons, which witness the validity of the proposed algorithm.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/liu12a.html
http://proceedings.mlr.press/v25/liu12a.htmlActive Learning with Hinted Support Vector MachineThe abundance of real-world data and limited labeling budget calls for active learning, which is an important learning paradigm for reducing human labeling efforts. Many recently developed active learning algorithms consider both uncertainty and representativeness when making querying decisions. However, exploiting representativeness with uncertainty concurrently usually requires tackling sophisticated and challenging learning tasks, such as clustering. In this paper, we propose a new active learning framework, called hinted sampling, which takes both uncertainty and representativeness into account in a simpler way. We design a novel active learning algorithm within the hinted sampling framework with an extended support vector machine. Experimental results validate that the novel active learning algorithm can result in a better and more stable performance than that achieved by state-of-the-art algorithms.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/li12.html
http://proceedings.mlr.press/v25/li12.htmlVariational Bayesian MatchingMatching of samples refers to the problem of inferring unknown co-occurrence or alignment between observations in two data sets. Given two sets of equally many samples, the task is to find for each sample a representative sample in the other set, without prior knowledge on a distance measure between the sets. Recently a few alternative solutions have been suggested, based on maximization of joint likelihood or various measures of between-data statistical dependency. In this work we present an variational Bayesian solution for the problem, learning a Bayesian canonical correlation analysis model with a permutation parameter for re-ordering the samples in one of the sets. We approximate the posterior over the permutations, and demonstrate that the resulting matching algorithm clearly outperforms all of the earlier solutions.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/klami12.html
http://proceedings.mlr.press/v25/klami12.htmlFrustratingly Simplified Deployment in WLAN Localization by Learning from Route AnnotationRecently wireless LAN (WLAN) localization systems are gaining popularity in pervasive computing, machine learning and sensor networks communities, especially indoor scenarios where GPS coverage is limited. To accurately predict location, a large amount of fingerprints composed of received signal strength values is necessary. Moreover, standard supervised or semi-supervised approaches also require location information to each fingerprint, where annotation work is rather tedious and time consuming. To reduce the efforts and time required to build calibration data, we present a novel calibration methodology \route-annotation” and a self-training algorithm for learning from route information effectively. On the proposed calibration methodology, an annotator walks around while measuring fingerprints, then occasionally stops to annotate fingerprints with route from previous location to current location. This calibration reduces work time even compared to partially annotation, while routes have richer information for learning. The proposed learning algorithm comprises following two iterative steps: 1) inferring locations of each fingerprint under route constraints and 2) updating parameters. Experimental results on real-world datasets demonstrate learning from route-annotated data is comparable to state-of-the-art supervised and semi-supervised approaches trained with large amount of calibration data.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/kawajiri12.html
http://proceedings.mlr.press/v25/kawajiri12.htmlPrefacePreface to the Proceedings of the 4th Asian Conference on Machine Learning, 4-6th November 2012, Singapore.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/hoi12.html
http://proceedings.mlr.press/v25/hoi12.htmlMore Is Better: Large Scale Partially-supervised Sentiment ClassificationWe describe a bootstrapping algorithm to learn from partially labeled data, and the results of an empirical study for using it to improve performance of sentiment classification using up to 15 million unlabeled Amazon product reviews. Our experiments cover semi-supervised learning, domain adaptation and weakly supervised learning. In some cases our methods were able to reduce test error by more than half using such large amount of data.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/haimovitch12.html
http://proceedings.mlr.press/v25/haimovitch12.htmlLearning Temporal Association Rules on Symbolic Time SequencesWe introduce a temporal pattern model called Temporal Interval Tree Association Rules (Tita rules or Titar). This pattern model can express both uncertainty and temporal inaccuracy of temporal events. Among other things, Tita rules can express the usual time point operators, synchronicity, order, and chaining, as well as temporal negation and disjunctive temporal constraints. Using this representation, we present the Titar learner algorithm that can be used to extract Tita rules from large datasets expressed as Symbolic Time Sequences. The selection of temporal constraints (or time-frames) is at the core of the temporal learning. Our learning algorithm is based on two novel approaches for this problem. This first one is designed to select temporal constraints for the head of temporal association rules. The second selects temporal constraints for the body of such rules. We discuss the evaluation of probabilistic temporal association rules, evaluate our technique with two experiments, introduce a metric to evaluate sets of temporal rules, compare the results with two other approaches and discuss the results.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/guillame-bert12.html
http://proceedings.mlr.press/v25/guillame-bert12.htmlMax-Margin Ratio MachineIn this paper, we investigate the problem of exploiting global information to improve the performance of SVMs on large scale classification problems. We first present a unified general framework for the existing min-max machine methods in terms of within-class dispersions and between-class dispersions. By defining a new within-class dispersion measure, we then propose a novel max-margin ratio machine (MMRM) method that can be formulated as a linear programming problem with scalability for large data sets. Kernels can be easily incorporated into our method to address non-linear classification problems. Our empirical results show that the proposed MMRM approach achieves promising results on large data sets.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/gu12.html
http://proceedings.mlr.press/v25/gu12.htmlA stochastic bandit algorithm for scratch gamesStochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called \scratch-games", where arm budgets are limited and reward are drawn without replacement. Using Serfling inequality, we propose an upper confidence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization).Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/feraud12.html
http://proceedings.mlr.press/v25/feraud12.htmlOnline Learning of a Dirichlet Process Mixture of Generalized Dirichlet Distributions for Simultaneous Clustering and Localized Feature SelectionOnline algorithms allow data instances to be processed in a sequential way, which is important for large-scale and real-time applications. In this paper, we propose a novel online clustering approach based on a Dirichlet process mixture of generalized Dirichlet (GD) distributions, which can be considered as an extension of the finite GD mixture model to the infinite case. Our approach is built on nonparametric Bayesian analysis where the determination of the number of clusters is sidestepped by assuming an infinite number of mixture components. Moreover, an unsupervised localized feature selection scheme is integrated with the proposed nonparametric framework to improve the clustering performance. By learning the proposed model in an online manner using a variational approach, all the involved parameters and features saliencies are estimated simultaneously and effectively in closed forms. The proposed online infinite mixture model is validated through both synthetic data sets and two challenging real-world applications namely text document clustering and online human face detection.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/fan12.html
http://proceedings.mlr.press/v25/fan12.htmlAIC and BIC based approaches for SVM parameter value estimation with RBF kernelsWe study the problem of selecting the best parameter values to use for a support vector machine (SVM) with RBF kernel. Our methods extend the well-known formulas for AIC and BIC, and we present two alternative approaches for calculating the necessary likelihood functions for these formulas. Our first approach is based on using the distances of support vectors from the separating hyperplane. Our second approach estimates the probability that the SVM hyperplane coincides with the Bayes classifier, by analysing the disposition of points in the kernel feature space. We experimentally compare our two approaches with several existing methods and show they are able to achieve good accuracy, whilst also having low running time.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/demyanov12.html
http://proceedings.mlr.press/v25/demyanov12.htmlA Ranking-based KNN Approach for Multi-Label ClassificationMulti-label classification has attracted a great deal of attention in recent years. This paper presents an interesting finding, namely, being able to identify neighbors with trustable labels can significantly improve the classification accuracy. Based on this finding, we propose a k-nearest-neighbor-based ranking approach to solve the multi-label classification problem. The approach exploits a ranking model to learn which neighbor’s labels are more trustable candidates for a weighted KNN-based strategy, and then assigns higher weights to those candidates when making weighted-voting decisions. The weights can then be determined by using a generalized pattern search technique. We collect several real-word data sets from various domains for the experiment. Our experiment results demonstrate that the proposed method outperforms state-of-the-art instance-based learning approaches. We believe that appropriately exploiting k-nearest neighbors is useful to solve the multi-label problem.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/chiang12.html
http://proceedings.mlr.press/v25/chiang12.htmlA Coupled Indian Buffet Process Model for Collaborative FilteringThe dramatic rates new digital content becomes available has brought collaborative filtering systems in the epicenter of computer science research in the last decade. In this paper, we propose a novel methodology for rating prediction utilizing concepts from the field of Bayesian nonparametrics. The basic concept that underlies our approach is that each user rates a presented item based on the latent genres of the item and the latent interests of the user. Each item may belong to more than one genre, and each user may belong to more than one latent interest class. The number of existing latent genres and interests are not known beforehand, but should be inferred in a data-driven fashion. We devise a novel hierarchical factor analysis model to formulate our approach under these assumptions. We impose suitable priors over the allocation of items into genres, and users into interests; specifically, we utilize a novel scheme which comprises two coupled Indian buffet process priors that allow the number of latent classes (genres/interests) to be automatically inferred. We experiment on a large set of real ratings data, and show that our approach outperforms four common baselines, including two very competitive state-of-the-art approaches.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/chatzis12.html
http://proceedings.mlr.press/v25/chatzis12.htmlLocal Kernel Density Ratio-Based Feature Selection for Outlier DetectionSelecting features is an important step of any machine learning task, though most of the focus has been to choose features relevant for classification and regression. In this work, we present a novel non-parametric evaluation criterion for filter-based feature selection which enhances outlier detection. Our proposed method seeks the subset of features that represents the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of this feature selection algorithm compared to popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/azmandian12.html
http://proceedings.mlr.press/v25/azmandian12.htmlLearning Latent Variable Models by Pairwise Cluster ComparisonIdentification of latent variables that govern a problem and the relationships among them given measurements in the observed world are important for causal discovery. This identification can be made by analyzing constraints imposed by the latents in the measurements. We introduce the concept of pairwise cluster comparison PCC to identify causal relationships from clusters and a two-stage algorithm, called LPCC, that learns a latent variable model (LVM) using PCC. First, LPCC learns the exogenous and the collider latents, as well as their observed descendants, by utilizing pairwise comparisons between clusters in the measurement space that may explain latent causes. Second, LPCC learns the non-collider endogenous latents and their children by splitting these latents from their previously learned latent ancestors. LPCC is not limited to linear or latent-tree models and does not make assumptions about the distribution. Using simulated and real-world datasets, we show that LPCC improves accuracy with the sample size, can learn large LVMs, and is accurate in learning compared to state-of-the-art algorithms.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/asbeh12.html
http://proceedings.mlr.press/v25/asbeh12.htmlMultiresolution Mixture Modeling using Merging of Mixture ComponentsObserving natural phenomena at several levels of detail results in multiresolution data. Extending models and algorithms to cope with multiresolution data is a prerequisite for wide-spread exploitation of the data represented in the multiple resolutions. Mixture models are widely used probabilistic models, however, the mixture models in their standard form can be used to analyze the data represented in a single resolution. In this paper, we propose a multiresolution mixture model based on merging of the mixture components across models represented in different resolutions. Result of such an analysis scenario is to have multiple mixture models, one mixture model for each resolution of data. Our proposed solution is based on the idea on the interaction between mixture models. More specifically, we repeatedly merge component distributions of mixture models across different resolutions. We experiment our proposed algorithm on the two real-world chromosomal aberration datasets represented in two different resolutions. Results show an improvement on the compared multiresolution settings.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/adhikari12.html
http://proceedings.mlr.press/v25/adhikari12.htmlA Note on Metric Properties for Some Divergence Measures: The Gaussian CaseMultivariate Gaussian densities are pervasive in pattern recognition and machine learning. A central operation that appears in most of these areas is to measure the difference between two multivariate Gaussians. Unfortunately, traditional measures based on the Kullback-Leibler (KL) divergence and the Bhattacharyya distance do not satisfy all metric axioms necessary for many algorithms. In this paper we propose a modification for the KL divergence and the Bhattacharyya distance, for multivariate Gaussian densities, that transforms the two measures into distance metrics. Next, we show how these metric axioms impact the unfolding process of manifold learning algorithms. Finally, we illustrate the efficacy of the proposed metrics on two different manifold learning algorithms when used for motion clustering in video data. Our results show that, in this particular application, the new proposed metrics lead to boosts in performance (at least 7%) when compared to other divergence measures.Sat, 17 Nov 2012 00:00:00 +0000
http://proceedings.mlr.press/v25/aboumoustafa12.html
http://proceedings.mlr.press/v25/aboumoustafa12.html