- title: 'Preface' abstract: 'Preface to the Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics May 13-15, 2010, Chia Laguna Resort, Sardinia, Italy.' volume: 9 URL: https://proceedings.mlr.press/v9/teh10a.html PDF: http://proceedings.mlr.press/v9/teh10a/teh10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-teh10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Yee Whye family: Teh - given: Mike family: Titterington editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: i-v id: teh10a issued: date-parts: - 2010 - 3 - 31 firstpage: i lastpage: v published: 2010-03-31 00:00:00 +0000 - title: 'Learning the Structure of Deep Sparse Graphical Models' abstract: 'Deep belief networks are a powerful way to model complex probability distributions. However, it is difficult to learn the structure of a belief network, particularly one with hidden units. The Indian buffet process has been used as a nonparametric Bayesian prior on the structure of a directed belief network with a single infinitely wide hidden layer. Here, we introduce the cascading Indian buffet process (CIBP), which provides a prior on the structure of a layered, directed belief network that is unbounded in both depth and width, yet allows tractable inference. We use the CIBP prior with the nonlinear Gaussian belief network framework to allow each unit to vary its behavior between discrete and continuous representations. We use Markov chain Monte Carlo for inference in this model and explore the structures learned on image data.' volume: 9 URL: https://proceedings.mlr.press/v9/adams10a.html PDF: http://proceedings.mlr.press/v9/adams10a/adams10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-adams10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Ryan P. family: Adams - given: Hanna family: Wallach - given: Zoubin family: Ghahramani editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 1-8 id: adams10a issued: date-parts: - 2010 - 3 - 31 firstpage: 1 lastpage: 8 published: 2010-03-31 00:00:00 +0000 - title: 'Optimal Allocation Strategies for the Dark Pool Problem' abstract: 'We study the problem of allocating stocks to dark pools. We propose and analyze an optimal approach for allocations, if continuous-valued allocations are allowed. We also propose a modification for the case when only integer-valued allocations are possible. We extend the previous work on this problem (Ganchev et al., 2009) to adversarial scenarios, while also improving over their results in the iid setup. The resulting algorithms are efficient, and perform well in simulations under stochastic and adversarial inputs.' volume: 9 URL: https://proceedings.mlr.press/v9/agarwal10a.html PDF: http://proceedings.mlr.press/v9/agarwal10a/agarwal10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-agarwal10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Alekh family: Agarwal - given: Peter family: Bartlett - given: Max family: Dama editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 9-16 id: agarwal10a issued: date-parts: - 2010 - 3 - 31 firstpage: 9 lastpage: 16 published: 2010-03-31 00:00:00 +0000 - title: 'Multitask Learning for Brain-Computer Interfaces' abstract: 'Brain-computer interfaces (BCIs) are limited in their applicability in everyday settings by the current necessity to record subject-specific calibration data prior to actual use of the BCI for communication. In this paper, we utilize the framework of multitask learning to construct a BCI that can be used without any subject-specific calibration process. We discuss how this out-of-the-box BCI can be further improved in a computationally efficient manner as subject-specific data becomes available. The feasibility of the approach is demonstrated on two sets of experimental EEG data recorded during a standard two-class motor imagery paradigm from a total of 19 healthy subjects. Specifically, we show that satisfactory classification results can be achieved with zero training data, and combining prior recordings with subject-specific calibration data substantially outperforms using subject-specific data only. Our results further show that transfer between recordings under slightly different experimental setups is feasible.' volume: 9 URL: https://proceedings.mlr.press/v9/alamgir10a.html PDF: http://proceedings.mlr.press/v9/alamgir10a/alamgir10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-alamgir10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Morteza family: Alamgir - given: Moritz family: Grosse–Wentrup - given: Yasemin family: Altun editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 17-24 id: alamgir10a issued: date-parts: - 2010 - 3 - 31 firstpage: 17 lastpage: 24 published: 2010-03-31 00:00:00 +0000 - title: 'Efficient Multioutput Gaussian Processes through Variational Inducing Kernels' abstract: 'Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way to construct such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Alvarez and Lawrence recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.' volume: 9 URL: https://proceedings.mlr.press/v9/alvarez10a.html PDF: http://proceedings.mlr.press/v9/alvarez10a/alvarez10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-alvarez10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Mauricio family: Álvarez - given: David family: Luengo - given: Michalis family: Titsias - given: Neil D. family: Lawrence editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 25-32 id: alvarez10a issued: date-parts: - 2010 - 3 - 31 firstpage: 25 lastpage: 32 published: 2010-03-31 00:00:00 +0000 - title: 'Learning with Blocks: Composite Likelihood and Contrastive Divergence' abstract: 'Composite likelihood methods provide a wide spectrum of computationally efficient techniques for statistical tasks such as parameter estimation and model selection. In this paper, we present a formal connection between the optimization of composite likelihoods and the well-known contrastive divergence algorithm. In particular, we show that composite likelihoods can be stochastically optimized by performing a variant of contrastive divergence with random-scan blocked Gibbs sampling. By using higher-order composite likelihoods, our proposed learning framework makes it possible to trade off computation time for increased accuracy. Furthermore, one can choose composite likelihood blocks that match the model’s dependence structure, making the optimization of higher-order composite likelihoods computationally efficient. We empirically analyze the performance of blocked contrastive divergence on various models, including visible Boltzmann machines, conditional random fields, and exponential random graph models, and we demonstrate that using higher-order blocks improves both the accuracy of parameter estimates and the rate of convergence.' volume: 9 URL: https://proceedings.mlr.press/v9/asuncion10a.html PDF: http://proceedings.mlr.press/v9/asuncion10a/asuncion10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-asuncion10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Arthur family: Asuncion - given: Qiang family: Liu - given: Alexander family: Ihler - given: Padhraic family: Smyth editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 33-40 id: asuncion10a issued: date-parts: - 2010 - 3 - 31 firstpage: 33 lastpage: 40 published: 2010-03-31 00:00:00 +0000 - title: 'Deterministic Bayesian inference for the $p*$ model' abstract: 'The $p*$ model is widely used in social network analysis. The likelihood of a network under this model is impossible to calculate for all but trivially small networks. Various approximation have been presented in the literature, and the pseudolikelihood approximation is the most popular. The aim of this paper is to introduce two likelihood approximations which have the pseudolikelihood estimator as a special case. We show, for the examples that we have considered, that both approximations result in improved estimation of model parameters with respect to the standard methodological approaches. We provide a deterministic approach and also illustrate how Bayesian model choice can be carried out in this setting.' volume: 9 URL: https://proceedings.mlr.press/v9/austad10a.html PDF: http://proceedings.mlr.press/v9/austad10a/austad10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-austad10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Haakon family: Austad - given: Nial family: Friel editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 41-48 id: austad10a issued: date-parts: - 2010 - 3 - 31 firstpage: 41 lastpage: 48 published: 2010-03-31 00:00:00 +0000 - title: 'Half Transductive Ranking' abstract: 'We study the standard retrieval task of ranking a fixed set of items given a previously unseen query and pose it as the half transductive ranking problem. The task is transductive as the set of items is fixed. Transductive representations (where the vector representation of each example is learned) allow the generation of highly nonlinear embeddings that capture object relationships without relying on a specific choice of features, and require only relatively simple optimization. Unfortunately, they have no direct out-of-sample extension. Inductive approaches on the other hand allow for the representation of unknown queries. We describe algorithms for this setting which have the advantages of both transductive and inductive approaches, and can be applied in unsupervised (either reconstruction-based or graph-based) and supervised ranking setups. We show empirically that our methods give strong performance on all three tasks.' volume: 9 URL: https://proceedings.mlr.press/v9/bai10a.html PDF: http://proceedings.mlr.press/v9/bai10a/bai10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-bai10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Bing family: Bai - given: Jason family: Weston - given: David family: Grangier - given: Ronan family: Collobert - given: Corinna family: Cortes - given: Mehryar family: Mohri editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 49-56 id: bai10a issued: date-parts: - 2010 - 3 - 31 firstpage: 49 lastpage: 56 published: 2010-03-31 00:00:00 +0000 - title: 'Kernel Partial Least Squares is Universally Consistent' abstract: 'We prove the statistical consistency of kernel Partial Least Squares Regression applied to a bounded regression learning problem on a reproducing kernel Hilbert space. Partial Least Squares stands out of well-known classical approaches as e.g. Ridge Regression or Principal Components Regression, as it is not defined as the solution of a global cost minimization procedure over a fixed model nor is it a linear estimator. Instead, approximate solutions are constructed by projections onto a nested set of data-dependent subspaces. To prove consistency, we exploit the known fact that Partial Least Squares is equivalent to the conjugate gradient algorithm in combination with early stopping. The choice of the stopping rule (number of iterations) is a crucial point. We study two empirical stopping rules. The first one monitors the estimation error in each iteration step of Partial Least Squares, and the second one estimates the empirical complexity in terms of a condition number. Both stopping rules lead to universally consistent estimators provided the kernel is universal.' volume: 9 URL: https://proceedings.mlr.press/v9/blanchard10a.html PDF: http://proceedings.mlr.press/v9/blanchard10a/blanchard10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-blanchard10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Gilles family: Blanchard - given: Nicole family: Krämer editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 57-64 id: blanchard10a issued: date-parts: - 2010 - 3 - 31 firstpage: 57 lastpage: 64 published: 2010-03-31 00:00:00 +0000 - title: 'Towards Understanding Situated Natural Language' abstract: 'We present a general framework and learning algorithm for the task of concept labeling: each word in a given sentence has to be tagged with the unique physical entity (e.g. person, object or location) or abstract concept it refers to. Our method allows both world knowledge and linguistic information to be used during learning and prediction. We show experimentally that we can learn to use world knowledge to resolve ambiguities in language, such as word senses or reference resolution, without the use of handcrafted rules or features.' volume: 9 URL: https://proceedings.mlr.press/v9/bordes10a.html PDF: http://proceedings.mlr.press/v9/bordes10a/bordes10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-bordes10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Antoine family: Bordes - given: Nicolas family: Usunier - given: Ronan family: Collobert - given: Jason family: Weston editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 65-72 id: bordes10a issued: date-parts: - 2010 - 3 - 31 firstpage: 65 lastpage: 72 published: 2010-03-31 00:00:00 +0000 - title: 'Using Descendants as Instrumental Variables for the Identification of Direct Causal Effects in Linear SEMs' abstract: 'In this paper, we present an extended set of graphical criteria for the identification of direct causal effects in linear Structural Equation Models (SEMs). Previous methods of graphical identification of direct causal effects in linear SEMs include methods such as the single-door criterion, the instrumental variable and the IV-pair, and the accessory set. However, there remain graphical models where a direct causal effect can be identified and these graphical criteria all fail. As a result, we introduce a new set of graphical criteria which uses descendants of either the cause variable or the effect variable as “path-specific instrumental variables” for the identification of the direct causal effect as long as certain conditions are satisfied. These conditions are based on edge removal and the existing graphical criteria of instrumental variables, and the identifiability of certain other total effects, and thus can be easily checked.' volume: 9 URL: https://proceedings.mlr.press/v9/chan10a.html PDF: http://proceedings.mlr.press/v9/chan10a/chan10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-chan10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Hei family: Chan - given: Manabu family: Kuroki editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 73-80 id: chan10a issued: date-parts: - 2010 - 3 - 31 firstpage: 73 lastpage: 80 published: 2010-03-31 00:00:00 +0000 - title: 'Why are DBNs sparse?' abstract: 'Real stochastic processes operate in continuous time and can be modeled by sets of stochastic differential equations. On the other hand, several popular model families, including hidden Markov models and dynamic Bayesian networks (DBNs), use discrete time steps. This paper explores methods for converting DBNs with infinitesimal time steps into DBNs with finite time steps, to enable efficient simulation and filtering over long periods. An exact conversion—summing out all intervening time slices between two steps—results in a completely connected DBN, yet nearly all human-constructed DBNs are sparse. We show how this sparsity arises from well-founded approximations resulting from differences among the natural time scales of the variables in the DBN. We define an automated procedure for constructing a provably accurate, approximate DBN model for any desired time step. We illustrate the method by generating a series of approximations to a simple pH model for the human body, demonstrating speedups of several orders of magnitude compared to the original model.' volume: 9 URL: https://proceedings.mlr.press/v9/chatterjee10a.html PDF: http://proceedings.mlr.press/v9/chatterjee10a/chatterjee10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-chatterjee10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Shaunak family: Chatterjee - given: Stuart family: Russell editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 81-88 id: chatterjee10a issued: date-parts: - 2010 - 3 - 31 firstpage: 81 lastpage: 88 published: 2010-03-31 00:00:00 +0000 - title: 'Focused Belief Propagation for Query-Specific Inference' abstract: 'With the increasing popularity of large-scale probabilistic graphical models, even “lightweight” approximate inference methods are becoming infeasible. Fortunately, often large parts of the model are of no immediate interest to the end user. Given the variable that the user actually cares about, we show how to quantify edge importance in graphical models and to significantly speed up inference by focusing computation on important parts of the model. Our algorithm empirically demonstrates convergence speedup by multiple times over state of the art' volume: 9 URL: https://proceedings.mlr.press/v9/chechetka10a.html PDF: http://proceedings.mlr.press/v9/chechetka10a/chechetka10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-chechetka10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Anton family: Chechetka - given: Carlos family: Guestrin editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 89-96 id: chechetka10a issued: date-parts: - 2010 - 3 - 31 firstpage: 89 lastpage: 96 published: 2010-03-31 00:00:00 +0000 - title: 'Parametric Herding' abstract: 'A parametric version of herding is formulated. The nonlinear mapping between consecutive time slices is learned by a form of self-supervised training. The resulting dynamical system generates pseudo-samples that resemble the original data. We show how this parametric herding can be successfully used to compress a dataset consisting of binary digits. It is also verified that high compression rates translate into good prediction performance on unseen test data.' volume: 9 URL: https://proceedings.mlr.press/v9/chen10a.html PDF: http://proceedings.mlr.press/v9/chen10a/chen10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-chen10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Yutian family: Chen - given: Max family: Welling editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 97-104 id: chen10a issued: date-parts: - 2010 - 3 - 31 firstpage: 97 lastpage: 104 published: 2010-03-31 00:00:00 +0000 - title: 'Mass Fatality Incident Identification based on nuclear DNA evidence' abstract: 'This paper focuses on the use of nuclear DNA Short Tandem Repeat traits for the identification of the victims of a Mass Fatality Incident. The goal of the analysis is the assessment of the identification probabilities concerning the recovered victims. Identification hypotheses are evaluated conditionally to the DNA evidence observed both on the recovered victims and on the relatives of the missing persons disappeared in the tragical event. After specifying a set of conditional independence assertions suitable for the problem, an inference strategy is provided, treating some points to achieve computational efficiency. Finally, the proposal is tested through the simulation of a Mass Fatality Incident and the results are examined in details.' volume: 9 URL: https://proceedings.mlr.press/v9/corradi10a.html PDF: http://proceedings.mlr.press/v9/corradi10a/corradi10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-corradi10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Fabio family: Corradi editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 105-112 id: corradi10a issued: date-parts: - 2010 - 3 - 31 firstpage: 105 lastpage: 112 published: 2010-03-31 00:00:00 +0000 - title: 'On the Impact of Kernel Approximation on Learning Accuracy' abstract: 'Kernel approximation is commonly used to scale kernel-based algorithms to applications containing as many as several million instances. This paper analyzes the effect of such approximations in the kernel matrix on the hypothesis generated by several widely used learning algorithms. We give stability bounds based on the norm of the kernel approximation for these algorithms, including SVMs, KRR, and graph Laplacian-based regularization algorithms. These bounds help determine the degree of approximation that can be tolerated in the estimation of the kernel matrix. Our analysis is general and applies to arbitrary approximations of the kernel matrix. However, we also give a specific analysis of the Nystrom low-rank approximation in this context and report the results of experiments evaluating the quality of the Nystrom low-rank kernel approximation when used with ridge regression.' volume: 9 URL: https://proceedings.mlr.press/v9/cortes10a.html PDF: http://proceedings.mlr.press/v9/cortes10a/cortes10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-cortes10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Corinna family: Cortes - given: Mehryar family: Mohri - given: Ameet family: Talwalkar editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 113-120 id: cortes10a issued: date-parts: - 2010 - 3 - 31 firstpage: 113 lastpage: 120 published: 2010-03-31 00:00:00 +0000 - title: 'Improving posterior marginal approximations in latent Gaussian models' abstract: 'We consider the problem of correcting the posterior marginal approximations computed by expectation propagation and Laplace approximation in latent Gaussian models and propose correction methods that are similar in spirit to the Laplace approximation of Tierney and Kadane (1986). We show that in the case of sparse Gaussian models, the computational complexity of expectation propagation can be made comparable to that of the Laplace approximation by using a parallel updating scheme. In some cases, expectation propagation gives excellent estimates, where the Laplace approximation fails. Inspired by bounds on the marginal corrections, we arrive at factorized approximations, which can be applied on top of both expectation propagation and Laplace. These give nearly indistinguishable results from the non-factorized approximations in a fraction of the time.' volume: 9 URL: https://proceedings.mlr.press/v9/cseke10a.html PDF: http://proceedings.mlr.press/v9/cseke10a/cseke10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-cseke10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Botond family: Cseke - given: Tom family: Heskes editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 121-128 id: cseke10a issued: date-parts: - 2010 - 3 - 31 firstpage: 121 lastpage: 128 published: 2010-03-31 00:00:00 +0000 - title: 'Impossibility Theorems for Domain Adaptation' abstract: 'The domain adaptation problem in machine learning occurs when the test data generating distribution differs from the one that generates the training data. It is clear that the success of learning under such circumstances depends on similarities between the two data distributions. We study assumptions about the relationship between the two distributions that one needed for domain adaptation learning to succeed. We analyze the assumptions in an agnostic PAC-style learning model for a the setting in which the learner can access a labeled training data sample and an unlabeled sample generated by the test data distribution. We focus on three assumptions: (i) Similarity between the unlabeled distributions, (ii) Existence of a classifier in the hypothesis class with low error on both training and testing distributions, and (iii) The covariate shift assumption. I.e., the assumption that the conditioned label distribution (for each data point) is the same for both the training and test distributions. We show that without either assumption (i) or (ii), the combination of the remaining assumptions is not sufficient to guarantee successful learning. Our negative results hold with respect to any domain adaptation learning algorithm, as long as it does not have access to target labeled examples. In particular, we provide formal proofs that the popular covariate shift assumption is rather weak and does not relieve the necessity of the other assumptions. We also discuss the intuitively appealing paradigm of reweighing the labeled training sample according to the target unlabeled distribution. We show that, somewhat counter intuitively, that paradigm cannot be trusted in the following sense. There are DA tasks that are indistinguishable, as far as the input training data goes, but in which reweighing leads to significant improvement in one task, while causing dramatic deterioration of the learning success in the other.' volume: 9 URL: https://proceedings.mlr.press/v9/david10a.html PDF: http://proceedings.mlr.press/v9/david10a/david10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-david10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Shai Ben family: David - given: Tyler family: Lu - given: Teresa family: Luu - given: David family: Pal editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 129-136 id: david10a issued: date-parts: - 2010 - 3 - 31 firstpage: 129 lastpage: 136 published: 2010-03-31 00:00:00 +0000 - title: 'Multiclass-Multilabel Classification with More Classes than Examples' abstract: 'We discuss multiclass-multilabel classification problems in which the set of possible labels is extremely large. Most existing multiclass-multilabel learning algorithms expect to observe a reasonably large sample from each class, and fail if they receive only a handful of examples with a given label. We propose and analyze the following two-stage approach: first use an arbitrary (perhaps heuristic) classification algorithm to construct an initial classifier, then apply a simple but principled method to augment this classifier by removing harmful labels from its output. A careful theoretical analysis allows us to justify our approach under some reasonable conditions (such as label sparsity and power-law distribution of label frequencies), even when the training set does not provide a statistically accurate representation of most classes. Surprisingly, our theoretical analysis continues to hold even when the number of classes exceeds the sample size. We demonstrate the merits of our approach on the ambitious task of categorizing the entire web using the 1.5 million categories defined on Wikipedia.' volume: 9 URL: https://proceedings.mlr.press/v9/dekel10a.html PDF: http://proceedings.mlr.press/v9/dekel10a/dekel10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-dekel10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Ofer family: Dekel - given: Ohad family: Shamir editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 137-144 id: dekel10a issued: date-parts: - 2010 - 3 - 31 firstpage: 137 lastpage: 144 published: 2010-03-31 00:00:00 +0000 - title: 'Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines' abstract: 'Alternating Gibbs sampling is the most common scheme used for sampling from Restricted Boltzmann Machines (RBM), a crucial component in deep architectures such as Deep Belief Networks. However, we find that it often does a very poor job of rendering the diversity of modes captured by the trained model. We suspect that this hinders the advantage that could in principle be brought by training algorithms relying on Gibbs sampling for uncovering spurious modes, such as the Persistent Contrastive Divergence algorithm. To alleviate this problem, we explore the use of tempered Markov Chain Monte-Carlo for sampling in RBMs. We find both through visualization of samples and measures of likelihood on a toy dataset that it helps both sampling and learning.' volume: 9 URL: https://proceedings.mlr.press/v9/desjardins10a.html PDF: http://proceedings.mlr.press/v9/desjardins10a/desjardins10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-desjardins10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Guillaume family: Desjardins - given: Aaron family: Courville - given: Yoshua family: Bengio - given: Pascal family: Vincent - given: Olivier family: Delalleau editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 145-152 id: desjardins10a issued: date-parts: - 2010 - 3 - 31 firstpage: 145 lastpage: 152 published: 2010-03-31 00:00:00 +0000 - title: 'Feature Selection using Multiple Streams' abstract: 'Feature selection for supervised learning can be greatly improved by making use of the fact that features often come in classes. For example, in gene expression data, the genes which serve as features may be divided into classes based on their membership in gene families or pathways. When labeling words with senses for word sense disambiguation, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. We present a streamwise feature selection method that allows dynamic generation and selection of features, while taking advantage of the different feature classes, and the fact that they are of different sizes and have different (but unknown) fractions of good features. Experimental results show that our approach provides significant improvement in performance and is computationally less expensive than comparable “batch” methods that do not take advantage of the feature classes and expect all features to be known in advance.' volume: 9 URL: https://proceedings.mlr.press/v9/dhillon10a.html PDF: http://proceedings.mlr.press/v9/dhillon10a/dhillon10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-dhillon10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Paramveer family: Dhillon - given: Dean family: Foster - given: Lyle family: Ungar editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 153-160 id: dhillon10a issued: date-parts: - 2010 - 3 - 31 firstpage: 153 lastpage: 160 published: 2010-03-31 00:00:00 +0000 - title: 'Bayesian variable order Markov models' abstract: 'We present a simple, effective generalisation of variable order Markov models to full online Bayesian estimation. The mechanism used is close to that employed in context tree weighting. The main contribution is the addition of a prior, conditioned on context, on the Markov order. The resulting construction uses a simple recursion and can be updated efficiently. This allows the model to make predictions using more complex contexts, as more data is acquired, if necessary. In addition, our model can be alternatively seen as a mixture of tree experts. Experimental results show that the predictive model exhibits consistently good performance in a variety of domains.' volume: 9 URL: https://proceedings.mlr.press/v9/dimitrakakis10a.html PDF: http://proceedings.mlr.press/v9/dimitrakakis10a/dimitrakakis10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-dimitrakakis10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Christos family: Dimitrakakis editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 161-168 id: dimitrakakis10a issued: date-parts: - 2010 - 3 - 31 firstpage: 161 lastpage: 168 published: 2010-03-31 00:00:00 +0000 - title: 'Nonparametric Bayesian Matrix Factorization by Power-EP' abstract: 'Many real-world applications can be modeled by matrix factorization. By approximating an observed data matrix as the product of two latent matrices, matrix factorization can reveal hidden structures embedded in data. A common challenge to use matrix factorization is determining the dimensionality of the latent matrices from data. Indian Buffet Processes (IBPs) enable us to apply the nonparametric Bayesian machinery to address this challenge. However, it remains a difficult task to learn nonparametric Bayesian matrix factorization models. In this paper, we propose a novel variational Bayesian method based on new equivalence classes of infinite matrices for learning these models. Furthermore, inspired by the success of nonnegative matrix factorization on many learning problems, we impose nonnegativity constraints on the latent matrices and mix variational inference with expectation propagation. This mixed inference method is unified in a power expectation propagation framework. Experimental results on image decomposition demonstrate the superior computational efficiency and the higher prediction accuracy of our methods compared to alternative Monte Carlo and variational inference methods for IBP models. We also apply the new methods to collaborative filtering and role mining and show the improved predictive performance over other matrix factorization methods.' volume: 9 URL: https://proceedings.mlr.press/v9/ding10a.html PDF: http://proceedings.mlr.press/v9/ding10a/ding10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ding10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Nan family: Ding - given: Yuan family: Qi - given: Rongjing family: Xiang - given: Ian family: Molloy - given: Ninghui family: Li editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 169-176 id: ding10a issued: date-parts: - 2010 - 3 - 31 firstpage: 169 lastpage: 176 published: 2010-03-31 00:00:00 +0000 - title: 'Neural conditional random fields' abstract: 'We propose a non-linear graphical model for structured prediction. It combines the power of deep neural networks to extract high level features with the graphical framework of Markov networks, yielding a powerful and scalable probabilistic model that we apply to signal labeling tasks.' volume: 9 URL: https://proceedings.mlr.press/v9/do10a.html PDF: http://proceedings.mlr.press/v9/do10a/do10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-do10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Trinh–Minh–Tri family: Do - given: Thierry family: Artieres editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 177-184 id: do10a issued: date-parts: - 2010 - 3 - 31 firstpage: 177 lastpage: 184 published: 2010-03-31 00:00:00 +0000 - title: 'Combining Experiments to Discover Linear Cyclic Models with Latent Variables' abstract: 'We present an algorithm to infer causal relations between a set of measured variables on the basis of experiments on these variables. The algorithm assumes that the causal relations are linear, but is otherwise completely general: It provides consistent estimates when the true causal structure contains feedback loops and latent variables, while the experiments can involve surgical or ’soft’ interventions on one or multiple variables at a time. The algorithm is ’online’ in the sense that it combines the results from any set of available experiments, can incorporate background knowledge and resolves conflicts that arise from combining results from different experiments. In addition we provide a necessary and sufficient condition that (i) determines when the algorithm can uniquely return the true graph, and (ii) can be used to select the next best experiment until this condition is satisfied. We demonstrate the method by applying it to simulated data and the flow cytometry data of Sachs et al (2005).' volume: 9 URL: https://proceedings.mlr.press/v9/eberhardt10a.html PDF: http://proceedings.mlr.press/v9/eberhardt10a/eberhardt10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-eberhardt10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Frederick family: Eberhardt - given: Patrik family: Hoyer - given: Richard family: Scheines editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 185-192 id: eberhardt10a issued: date-parts: - 2010 - 3 - 31 firstpage: 185 lastpage: 192 published: 2010-03-31 00:00:00 +0000 - title: 'Graphical Gaussian modelling of multivariate time series with latent variables' abstract: 'In time series analysis, inference about cause-effect relationships among multiple times series is commonly based on the concept of Granger causality, which exploits temporal structure to achieve causal ordering of dependent variables. One major problem in the application of Granger causality for the identification of causal relationships is the possible presence of latent variables that affect the measured components and thus lead to so-called spurious causalities. In this paper, we describe a new graphical approach for modelling the dependence structure of multivariate stationary time series that are affected by latent variables. To this end, we introduce dynamic maximal ancestral graphs (dMAGs), in which each time series is represented by a single vertex. For Gaussian processes, this approach leads to vector autoregressive models with errors that are not independent but correlated according to the dashed edges in the graph. We discuss identifiability of the parameters and show that these models can be viewed as graphical ARMA models that satisfy the Granger causality restrictions encoded by the associated dynamic maximal ancestral graph.' volume: 9 URL: https://proceedings.mlr.press/v9/eichler10a.html PDF: http://proceedings.mlr.press/v9/eichler10a/eichler10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-eichler10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Michael family: Eichler editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 193-200 id: eichler10a issued: date-parts: - 2010 - 3 - 31 firstpage: 193 lastpage: 200 published: 2010-03-31 00:00:00 +0000 - title: 'Why Does Unsupervised Pre-training Help Deep Learning?' abstract: 'Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pre-training phase. The main question investigated here is the following: why does unsupervised pre-training work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al. 2009) and as an aid to optimization (Bengio et al. 2007). Our results build on the work of Erhan et al. 2009, showing that unsupervised pre-training appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pre-training effect.' volume: 9 URL: https://proceedings.mlr.press/v9/erhan10a.html PDF: http://proceedings.mlr.press/v9/erhan10a/erhan10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-erhan10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Dumitru family: Erhan - given: Aaron family: Courville - given: Yoshua family: Bengio - given: Pascal family: Vincent editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 201-208 id: erhan10a issued: date-parts: - 2010 - 3 - 31 firstpage: 201 lastpage: 208 published: 2010-03-31 00:00:00 +0000 - title: 'Semi-Supervised Learning via Generalized Maximum Entropy' abstract: 'Various supervised inference methods can be analyzed as convex duals of the generalized maximum entropy (MaxEnt) framework. Generalized MaxEnt aims to find a distribution that maximizes an entropy function while respecting prior information represented as potential functions in miscellaneous forms of constraints and/or penalties. We extend this framework to semi-supervised learning by incorporating unlabeled data via modifications to these potential functions reflecting structural assumptions on the data geometry. The proposed approach leads to a family of discriminative semi-supervised algorithms, that are convex, scalable, inherently multi-class, easy to implement, and that can be kernelized naturally. Experimental evaluation of special cases shows the competitiveness of our methodology.' volume: 9 URL: https://proceedings.mlr.press/v9/erkan10a.html PDF: http://proceedings.mlr.press/v9/erkan10a/erkan10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-erkan10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Ayse family: Erkan - given: Yasemin family: Altun editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 209-216 id: erkan10a issued: date-parts: - 2010 - 3 - 31 firstpage: 209 lastpage: 216 published: 2010-03-31 00:00:00 +0000 - title: 'Model-Free Monte Carlo-like Policy Evaluation' abstract: 'We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions.' volume: 9 URL: https://proceedings.mlr.press/v9/fonteneau10a.html PDF: http://proceedings.mlr.press/v9/fonteneau10a/fonteneau10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-fonteneau10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Raphael family: Fonteneau - given: Susan family: Murphy - given: Louis family: Wehenkel - given: Damien family: Ernst editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 217-224 id: fonteneau10a issued: date-parts: - 2010 - 3 - 31 firstpage: 217 lastpage: 224 published: 2010-03-31 00:00:00 +0000 - title: 'A Weighted Multi-Sequence Markov Model For Brain Lesion Segmentation' abstract: 'We propose a technique for fusing the output of multiple Magnetic Resonance (MR) sequences to robustly and accurately segment brain lesions. It is based on an augmented multi-sequence Hidden Markov model that includes additional weight variables to account for the relative importance and control the impact of each sequence. The augmented framework has the advantage of allowing 1) the incorporation of expert knowledge on the a priori relevant information content of each sequence and 2) a weighting scheme which is modified adaptively according to the data and the segmentation task under consideration. The model, applied to the detection of multiple sclerosis and stroke lesions shows promising results.' volume: 9 URL: https://proceedings.mlr.press/v9/forbes10a.html PDF: http://proceedings.mlr.press/v9/forbes10a/forbes10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-forbes10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Florence family: Forbes - given: Senan family: Doyle - given: Daniel family: Garcia–Lorenzo - given: Christian family: Barillot - given: Michel family: Dojat editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 225-232 id: forbes10a issued: date-parts: - 2010 - 3 - 31 firstpage: 225 lastpage: 232 published: 2010-03-31 00:00:00 +0000 - title: 'Posterior distributions are computable from predictive distributions' abstract: 'As we devise more complicated prior distributions, will inference algorithms keep up? We highlight a negative result in computable probability theory by Ackerman, Freer, and Roy (2010) that shows that there exist computable priors with noncomputable posteriors. In addition to providing a brief survey of computable probability theory geared towards the A.I. and statistics community, we give a new result characterizing when conditioning is computable in the setting of exchangeable sequences, and provide a computational perspective on work by Orbanz (2010) on conjugate nonparametric models. In particular, using a computable extension of de Finetti’s theorem (Freer and Roy 2009), we describe how to transform a posterior predictive rule for generating an exchangeable sequence into an algorithm for computing the posterior distribution of the directing random measure.' volume: 9 URL: https://proceedings.mlr.press/v9/freer10a.html PDF: http://proceedings.mlr.press/v9/freer10a/freer10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-freer10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Cameron family: Freer - given: Daniel family: Roy editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 233-240 id: freer10a issued: date-parts: - 2010 - 3 - 31 firstpage: 233 lastpage: 240 published: 2010-03-31 00:00:00 +0000 - title: 'Variational methods for Reinforcement Learning' abstract: 'We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate of the transition model does not reflect the uncertainty in the estimate of the transition model and the resulting policies may consequently lack a sufficient degree of exploration. We consider a Bayesian alternative that maintains a distribution over the transition so that the resulting policy takes into account the limited experience of the environment. The resulting algorithm is formally intractable and we discuss two approximate solution methods, Variational Bayes and Expectation Propagation.' volume: 9 URL: https://proceedings.mlr.press/v9/furmston10a.html PDF: http://proceedings.mlr.press/v9/furmston10a/furmston10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-furmston10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Thomas family: Furmston - given: David family: Barber editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 241-248 id: furmston10a issued: date-parts: - 2010 - 3 - 31 firstpage: 241 lastpage: 248 published: 2010-03-31 00:00:00 +0000 - title: 'Understanding the difficulty of training deep feedforward neural networks' abstract: 'Whereas before 2006 it appears that deep multi-layer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence.' volume: 9 URL: https://proceedings.mlr.press/v9/glorot10a.html PDF: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-glorot10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Xavier family: Glorot - given: Yoshua family: Bengio editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 249-256 id: glorot10a issued: date-parts: - 2010 - 3 - 31 firstpage: 249 lastpage: 256 published: 2010-03-31 00:00:00 +0000 - title: 'On Combining Graph-based Variance Reduction schemes' abstract: 'In this paper, we consider two variance reduction schemes that exploit the structure of the primal graph of the graphical model: Rao-Blackwellised w-cutset sampling and AND/OR sampling. We show that the two schemes are orthogonal and can be combined to further reduce the variance. Our combination yields a new family of estimators which trade time and space with variance. We demonstrate experimentally that the new estimators are superior, often yielding an order of magnitude improvement over previous schemes on several benchmarks.' volume: 9 URL: https://proceedings.mlr.press/v9/gogate10a.html PDF: http://proceedings.mlr.press/v9/gogate10a/gogate10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-gogate10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Vibhav family: Gogate - given: Rina family: Dechter editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 257-264 id: gogate10a issued: date-parts: - 2010 - 3 - 31 firstpage: 257 lastpage: 264 published: 2010-03-31 00:00:00 +0000 - title: 'Locally Linear Denoising on Image Manifolds' abstract: 'We study the problem of image denoising where images are assumed to be samples from low dimensional (sub)manifolds. We propose the algorithm of locally linear denoising. The algorithm approximates manifolds with locally linear patches by constructing nearest neighbor graphs. Each image is then locally denoised within its neighborhoods. A global optimal denoising result is then identified by aligning those local estimates. The algorithm has a closed-form solution that is efficient to compute. We evaluated and compared the algorithm to alternative methods on two image data sets. We demonstrated the effectiveness of the proposed algorithm, which yields visually appealing denoising results, incurs smaller reconstruction errors and results in lower error rates when the denoised data are used in supervised learning tasks.' volume: 9 URL: https://proceedings.mlr.press/v9/gong10a.html PDF: http://proceedings.mlr.press/v9/gong10a/gong10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-gong10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Dian family: Gong - given: Fei family: Sha - given: Gérard family: Medioni editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 265-272 id: gong10a issued: date-parts: - 2010 - 3 - 31 firstpage: 265 lastpage: 272 published: 2010-03-31 00:00:00 +0000 - title: 'Regret Bounds for Gaussian Process Bandit Problems' abstract: 'Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.' volume: 9 URL: https://proceedings.mlr.press/v9/grunewalder10a.html PDF: http://proceedings.mlr.press/v9/grunewalder10a/grunewalder10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-grunewalder10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Steffen family: Grünewälder - given: Jean–Yves family: Audibert - given: Manfred family: Opper - given: John family: Shawe–Taylor editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 273-280 id: grunewalder10a issued: date-parts: - 2010 - 3 - 31 firstpage: 273 lastpage: 280 published: 2010-03-31 00:00:00 +0000 - title: 'Sufficient covariates and linear propensity analysis' abstract: 'Working within the decision-theoretic framework for causal inference, we study the properties of “sufficient covariates", which support causal inference from observational data, and possibilities for their reduction. In particular we illustrate the role of a propensity variable by means of a simple model, and explain why such a reduction typically does not increase (and may reduce) estimation efficiency.' volume: 9 URL: https://proceedings.mlr.press/v9/guo10a.html PDF: http://proceedings.mlr.press/v9/guo10a/guo10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-guo10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Hui family: Guo - given: Philip family: Dawid editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 281-288 id: guo10a issued: date-parts: - 2010 - 3 - 31 firstpage: 281 lastpage: 288 published: 2010-03-31 00:00:00 +0000 - title: 'Real-time Multiattribute Bayesian Preference Elicitation with Pairwise Comparison Queries' abstract: 'Preference elicitation (PE) is an important component of interactive decision support systems that aim to make optimal recommendations to users by actively querying their preferences. In this paper, we outline five principles important for PE in real-world problems: (1) real-time, (2) multiattribute, (3) low cognitive load, (4) robust to noise, and (5) scalable. In light of these requirements, we introduce an approximate PE framework based on TrueSkill for performing efficient closed-form Bayesian updates and query selection for a multiattribute utility belief state — a novel PE approach that naturally facilitates the efficient evaluation of value of information (VOI) heuristics for use in query selection strategies. Our best VOI query strategy satisfies all five principles (in contrast to related work) and performs on par with the most accurate (and often computationally intensive) algorithms on experiments with synthetic and real-world datasets.' volume: 9 URL: https://proceedings.mlr.press/v9/guo10b.html PDF: http://proceedings.mlr.press/v9/guo10b/guo10b.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-guo10b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Shengbo family: Guo - given: Scott family: Sanner editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 289-296 id: guo10b issued: date-parts: - 2010 - 3 - 31 firstpage: 289 lastpage: 296 published: 2010-03-31 00:00:00 +0000 - title: 'Noise-contrastive estimation: A new estimation principle for unnormalized statistical models' abstract: 'We present a new estimation principle for parameterized statistical models. The idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise, using the model log-density function in the regression nonlinearity. We show that this leads to a consistent (convergent) estimator of the parameters, and analyze the asymptotic variance. In particular, the method is shown to directly work for unnormalized models, i.e. models where the density function does not integrate to one. The normalization constant can be estimated just like any other parameter. For a tractable ICA model, we compare the method with other estimation methods that can be used to learn unnormalized models, including score matching, contrastive divergence, and maximum-likelihood where the normalization constant is estimated with importance sampling. Simulations show that noise-contrastive estimation offers the best trade-off between computational and statistical efficiency. The method is then applied to the modeling of natural images: We show that the method can successfully estimate a large-scale two-layer model and a Markov random field.' volume: 9 URL: https://proceedings.mlr.press/v9/gutmann10a.html PDF: http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-gutmann10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Michael family: Gutmann - given: Aapo family: Hyvärinen editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 297-304 id: gutmann10a issued: date-parts: - 2010 - 3 - 31 firstpage: 297 lastpage: 304 published: 2010-03-31 00:00:00 +0000 - title: 'Boosted Optimization for Network Classification' abstract: 'In this paper we propose a new classification algorithm designed for application on complex networks motivated by algorithmic similarities between boosting learning and message passing. We consider a network classifier as a logistic regression where the variables define the nodes and the interaction effects define the edges. From this definition we represent the problem as a factor graph of local exponential loss functions. Using the factor graph representation it is possible to interpret the network classifier as an ensemble of individual node classifiers. We then combine ideas from boosted learning with network optimization algorithms to define two novel algorithms, Boosted Expectation Propagation (BEP) and Boosted Message Passing (BMP). These algorithms optimize the global network classifier performance by locally weighting each node classifier by the error of the surrounding network structure. We compare the performance of BEP and BMP to logistic regression as well state of the art penalized logistic regression models on simulated grid structured networks. The results show that using local boosting to optimize the performance of a network classifier increases classification performance and is especially powerful in cases when the whole network structure must be considered for accurate classification.' volume: 9 URL: https://proceedings.mlr.press/v9/hancock10a.html PDF: http://proceedings.mlr.press/v9/hancock10a/hancock10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-hancock10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Timothy family: Hancock - given: Hiroshi family: Mamitsuka editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 305-312 id: hancock10a issued: date-parts: - 2010 - 3 - 31 firstpage: 305 lastpage: 312 published: 2010-03-31 00:00:00 +0000 - title: 'Dirichlet Process Mixtures of Generalized Linear Models' abstract: 'We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLMs), a new method of nonparametric regression that accommodates continuous and categorical inputs, models a response variable locally by a generalized linear model. We give conditions for the existence and asymptotic unbiasedness of the DP-GLM regression mean function estimate; we then give a practical example for when those conditions hold. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression including regression trees and Gaussian processes.' volume: 9 URL: https://proceedings.mlr.press/v9/hannah10a.html PDF: http://proceedings.mlr.press/v9/hannah10a/hannah10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-hannah10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Lauren family: Hannah - given: David family: Blei - given: Warren family: Powell editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 313-320 id: hannah10a issued: date-parts: - 2010 - 3 - 31 firstpage: 313 lastpage: 320 published: 2010-03-31 00:00:00 +0000 - title: 'Negative Results for Active Learning with Convex Losses' abstract: 'We study the problem of active learning with convex loss functions. We prove that even under bounded noise constraints, the minimax rates for proper active learning are often no better than passive learning.' volume: 9 URL: https://proceedings.mlr.press/v9/hanneke10a.html PDF: http://proceedings.mlr.press/v9/hanneke10a/hanneke10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-hanneke10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Steve family: Hanneke - given: Liu family: Yang editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 321-325 id: hanneke10a issued: date-parts: - 2010 - 3 - 31 firstpage: 321 lastpage: 325 published: 2010-03-31 00:00:00 +0000 - title: 'Coherent Inference on Optimal Play in Game Trees' abstract: 'Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.' volume: 9 URL: https://proceedings.mlr.press/v9/hennig10a.html PDF: http://proceedings.mlr.press/v9/hennig10a/hennig10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-hennig10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Philipp family: Hennig - given: David family: Stern - given: Thore family: Graepel editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 326-333 id: hennig10a issued: date-parts: - 2010 - 3 - 31 firstpage: 326 lastpage: 333 published: 2010-03-31 00:00:00 +0000 - title: 'Collaborative Filtering via Rating Concentration' abstract: 'While most popular collaborative filtering methods use low-rank matrix factorization and parametric density assumptions, this article proposes an approach based on distribution-free concentration inequalities. Using agnostic hierarchical sampling assumptions, functions of observed ratings are provably close to their expectations over query ratings, on average. A joint probability distribution over queries of interest is estimated using maximum entropy regularization. The distribution resides in a convex hull of allowable candidate distributions which satisfy concentration inequalities that stem from the sampling assumptions. The method accurately estimates rating distributions on synthetic and real data and is competitive with low rank and parametric methods which make more aggressive assumptions about the problem.' volume: 9 URL: https://proceedings.mlr.press/v9/huang10a.html PDF: http://proceedings.mlr.press/v9/huang10a/huang10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-huang10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Bert family: Huang - given: Tony family: Jebara editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 334-341 id: huang10a issued: date-parts: - 2010 - 3 - 31 firstpage: 334 lastpage: 341 published: 2010-03-31 00:00:00 +0000 - title: 'Maximum-likelihood learning of cumulative distribution functions on graphs' abstract: 'For many applications, a probability model can be easily expressed as a cumulative distribution functions (CDF) as compared to the use of probability density or mass functions (PDF/PMFs). Cumulative distribution networks (CDNs) have recently been proposed as a class of graphical models for CDFs. One advantage of CDF models is the simplicity of representing multivariate heavy-tailed distributions. Examples of fields that can benefit from the use of graphical models for CDFs include climatology and epidemiology, where data may follow extreme value statistics and exhibit spatial correlations so that dependencies between model variables must be accounted for. The problem of learning from data in such settings may nevertheless consist of optimizing the log-likelihood function with respect to model parameters where we are required to optimize a log-PDF/PMF and not a log-CDF. We present a message-passing algorithm called the gradient-derivative-product (GDP) algorithm that allows us to learn the model in terms of the log-likelihood function whereby messages correspond to local gradients of the likelihood with respect to model parameters. We will demonstrate the GDP algorithm on real-world rainfall and H1N1 mortality data and we will show that CDNs provide a natural choice of parameterizations for the heavy-tailed multivariate distributions that arise in these problems.' volume: 9 URL: https://proceedings.mlr.press/v9/huang10b.html PDF: http://proceedings.mlr.press/v9/huang10b/huang10b.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-huang10b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Jim family: Huang - given: Nebojsa family: Jojic editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 342-349 id: huang10b issued: date-parts: - 2010 - 3 - 31 firstpage: 342 lastpage: 349 published: 2010-03-31 00:00:00 +0000 - title: 'Learning Nonlinear Dynamic Models from Non-sequenced Data' abstract: 'Virtually all methods of learning dynamic systems from data start from the same basic assumption: the learning algorithm will be given a sequence, or trajectory, of data generated from the dynamic system. We consider the case where the data is not sequenced. The training data points come from the system’s operation but with no temporal ordering. The data are simply drawn as individual disconnected points. While making this assumption may seem absurd at first glance, many scientific modeling tasks have exactly this property. Previous work proposed methods for learning linear, discrete time models under these assumptions by optimizing approximate likelihood functions. In this paper, we extend those methods to nonlinear models using kernel methods. We go on to propose a new approach to solving the problem that focuses on achieving temporal smoothness in the learned dynamics. The result is a convex criterion that can be easily optimized and often outperforms the earlier methods. We test these methods on several synthetic data sets including one generated from the Lorenz attractor.' volume: 9 URL: https://proceedings.mlr.press/v9/huang10c.html PDF: http://proceedings.mlr.press/v9/huang10c/huang10c.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-huang10c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Tzu–Kuo family: Huang - given: Le family: Song - given: Jeff family: Schneider editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 350-357 id: huang10c issued: date-parts: - 2010 - 3 - 31 firstpage: 350 lastpage: 357 published: 2010-03-31 00:00:00 +0000 - title: 'Learning Bayesian Network Structure using LP Relaxations' abstract: 'We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid acyclic structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal Bayesian network. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. Empirical results suggest that the method is competitive or faster than alternative exact methods based on dynamic programming.' volume: 9 URL: https://proceedings.mlr.press/v9/jaakkola10a.html PDF: http://proceedings.mlr.press/v9/jaakkola10a/jaakkola10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-jaakkola10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Tommi family: Jaakkola - given: David family: Sontag - given: Amir family: Globerson - given: Marina family: Meila editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 358-365 id: jaakkola10a issued: date-parts: - 2010 - 3 - 31 firstpage: 358 lastpage: 365 published: 2010-03-31 00:00:00 +0000 - title: 'Structured Sparse Principal Component Analysis' abstract: 'We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by Jenatton et al. (2009). While classical sparse priors only deal with cardinality, the regularization we use encodes higher-order information about the data. We propose an efficient and simple optimization procedure to solve this problem. Experiments with two practical tasks, the denoising of sparse structured signals and face recognition, demonstrate the benefits of the proposed structured approach over unstructured approaches.' volume: 9 URL: https://proceedings.mlr.press/v9/jenatton10a.html PDF: http://proceedings.mlr.press/v9/jenatton10a/jenatton10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-jenatton10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Rodolphe family: Jenatton - given: Guillaume family: Obozinski - given: Francis family: Bach editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 366-373 id: jenatton10a issued: date-parts: - 2010 - 3 - 31 firstpage: 366 lastpage: 373 published: 2010-03-31 00:00:00 +0000 - title: 'Nonlinear functional regression: a functional RKHS approach' abstract: 'This paper deals with functional regression, in which the input attributes as well as the response are functions. To deal with this problem, we develop a functional reproducing kernel Hilbert space approach; here, a kernel is an operator acting on a function and yielding a function. We demonstrate basic properties of these functional RKHS, as well as a representer theorem for this setting; we investigate the construction of kernels; we provide some experimental insight.' volume: 9 URL: https://proceedings.mlr.press/v9/kadri10a.html PDF: http://proceedings.mlr.press/v9/kadri10a/kadri10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kadri10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Hachem family: Kadri - given: Emmanuel family: Duflos - given: Philippe family: Preux - given: Stéphane family: Canu - given: Manuel family: Davy editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 374-380 id: kadri10a issued: date-parts: - 2010 - 3 - 31 firstpage: 374 lastpage: 380 published: 2010-03-31 00:00:00 +0000 - title: 'Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity' abstract: 'The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model. A central issue is learning these models in high-dimensions when the optimal parameter vector is sparse. This work characterizes a certain strong convexity property of general exponential families, which allows their generalization ability to be quantified. In particular, we show how this property can be used to analyze generic exponential families under L1 regularization.' volume: 9 URL: https://proceedings.mlr.press/v9/kakade10a.html PDF: http://proceedings.mlr.press/v9/kakade10a/kakade10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kakade10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Sham family: Kakade - given: Ohad family: Shamir - given: Karthik family: Sindharan - given: Ambuj family: Tewari editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 381-388 id: kakade10a issued: date-parts: - 2010 - 3 - 31 firstpage: 381 lastpage: 388 published: 2010-03-31 00:00:00 +0000 - title: 'Collaborative Filtering on a Budget' abstract: 'Matrix factorization is a successful technique for building collaborative filtering systems. While it works well on a large range of problems, it is also known for requiring significant amounts of storage for each user or item to be added to the database. This is a problem whenever the collaborative filtering task is larger than the medium-sized Netflix Prize data. In this paper, we propose a new model for representing and compressing matrix factors via hashing. This allows for essentially unbounded storage (at a graceful storage / performance trade-off) for users and items to be represented in a pre-defined memory footprint. It allows us to scale recommender systems to very large numbers of users or conversely, obtain very good performance even for tiny models (e.g. 400kB of data suffice for a representation of the EachMovie problem). We provide both experimental results and approximation bounds for our compressed representation and we show how this approach can be extended to multipartite problems.' volume: 9 URL: https://proceedings.mlr.press/v9/karatzoglou10a.html PDF: http://proceedings.mlr.press/v9/karatzoglou10a/karatzoglou10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-karatzoglou10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Alexandros family: Karatzoglou - given: Alex family: Smola - given: Markus family: Weimer editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 389-396 id: karatzoglou10a issued: date-parts: - 2010 - 3 - 31 firstpage: 389 lastpage: 396 published: 2010-03-31 00:00:00 +0000 - title: 'Fast Active-set-type Algorithms for $l1$-regularized Linear Regression' abstract: 'In this paper, we investigate new active-set-type methods for l1-regularized linear regression that overcome some difficulties of existing active set methods. By showing a relationship between $l1$-regularized linear regression and the linear complementarity problem with bounds, we present a fast active-set-type method, called block principal pivoting. This method accelerates computation by allowing exchanges of several variables among working sets. We further provide an improvement of this method, discuss its properties, and also explain a connection to the structure learning of Gaussian graphical models. Experimental comparisons on synthetic and real data sets show that the proposed method is significantly faster than existing active set methods and competitive against recently developed iterative methods.' volume: 9 URL: https://proceedings.mlr.press/v9/kim10a.html PDF: http://proceedings.mlr.press/v9/kim10a/kim10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kim10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Jingu family: Kim - given: Haesun family: Park editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 397-404 id: kim10a issued: date-parts: - 2010 - 3 - 31 firstpage: 397 lastpage: 404 published: 2010-03-31 00:00:00 +0000 - title: 'Online Anomaly Detection under Adversarial Impact' abstract: 'Security analysis of learning algorithms is gaining increasing importance, especially since they have become target of deliberate obstruction in certain applications. Some security-hardened algorithms have been previously proposed for supervised learning; however, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution, we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses three key security-related issues: derivation of an optimal attack, analysis of its efficiency and constraints. Experimental evaluation carried out on real HTTP and exploit traces confirms the tightness of our theoretical bounds.' volume: 9 URL: https://proceedings.mlr.press/v9/kloft10a.html PDF: http://proceedings.mlr.press/v9/kloft10a/kloft10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kloft10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Marius family: Kloft - given: Pavel family: Laskov editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 405-412 id: kloft10a issued: date-parts: - 2010 - 3 - 31 firstpage: 405 lastpage: 412 published: 2010-03-31 00:00:00 +0000 - title: 'Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach' abstract: 'We propose a novel application of the Simultaneous Orthogonal Matching Pursuit (S-OMP) procedure to perform variable selection in ultra-high dimensional multiple output regression problems, which is the first attempt to utilize multiple outputs to perform fast removal of the irrelevant variables. As our main theoretical contribution, we show that the S-OMP can be used to reduce an ultra-high number of variables to below the sample size, without losing relevant variables. We also provide formal evidence that the modified Bayesian information criterion (BIC) can be used to efficiently select the number of iterations in the S-OMP. Once the number of variables has been reduced to a manageable size, we show that a more computationally demanding procedure can be used to identify the relevant variables for each of the regression outputs. We further provide evidence on the benefit of variable selection using the regression outputs jointly, as opposed to performing variable selection for each output separately. The finite sample performance of the S-OMP has been demonstrated on extensive simulation studies.' volume: 9 URL: https://proceedings.mlr.press/v9/kolar10a.html PDF: http://proceedings.mlr.press/v9/kolar10a/kolar10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kolar10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Mladen family: Kolar - given: Eric family: Xing editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 413-420 id: kolar10a issued: date-parts: - 2010 - 3 - 31 firstpage: 413 lastpage: 420 published: 2010-03-31 00:00:00 +0000 - title: 'Semi-Supervised Learning with Max-Margin Graph Cuts' abstract: 'This paper proposes a novel algorithm for semi-supervised learning. This algorithm learns graph cuts that maximize the margin with respect to the labels induced by the harmonic function solution. We motivate the approach, compare it to existing work, and prove a bound on its generalization error. The quality of our solutions is evaluated on a synthetic problem and three UCI ML repository datasets. In most cases, we outperform manifold regularization of support vector machines, which is a state-of-the-art approach to semi-supervised max-margin learning.' volume: 9 URL: https://proceedings.mlr.press/v9/kveton10a.html PDF: http://proceedings.mlr.press/v9/kveton10a/kveton10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kveton10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Branislav family: Kveton - given: Michal family: Valko - given: Ali family: Rahimi - given: Ling family: Huang editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 421-428 id: kveton10a issued: date-parts: - 2010 - 3 - 31 firstpage: 421 lastpage: 428 published: 2010-03-31 00:00:00 +0000 - title: 'Solving the Uncapacitated Facility Location Problem Using Message Passing Algorithms' abstract: 'The Uncapacitated Facility Location Problem (UFLP) is one of the most widely studied discrete location problems, whose applications arise in a variety of settings. We tackle the UFLP using probabilistic inference in a graphical model - an approach that has received little attention in the past. We show that the fixed points of max-product linear programming (MPLP), a convexified version of the max-product algorithm, can be used to construct a solution with a 3-approximation guarantee for metric UFLP instances. In addition, we characterize some scenarios under which the MPLP solution is guaranteed to be globally optimal. We evaluate the performance of both max-sum and MPLP empirically on metric and non-metric problems, demonstrating the advantages of the 3-approximation construction and algorithm applicability to non-metric instances.' volume: 9 URL: https://proceedings.mlr.press/v9/lazic10a.html PDF: http://proceedings.mlr.press/v9/lazic10a/lazic10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lazic10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Nevena family: Lazic - given: Brendan family: Frey - given: Parham family: Aarabi editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 429-436 id: lazic10a issued: date-parts: - 2010 - 3 - 31 firstpage: 429 lastpage: 436 published: 2010-03-31 00:00:00 +0000 - title: 'Relating Function Class Complexity and Cluster Structure in the Function Domain with Applications to Transduction' abstract: 'We relate function class complexity to structure in the function domain. This facilitates risk analysis relative to cluster structure in the input space which is particularly effective in semi-supervised learning. In particular we quantify the complexity of function classes defined over a graph in terms of the graph structure.' volume: 9 URL: https://proceedings.mlr.press/v9/lever10a.html PDF: http://proceedings.mlr.press/v9/lever10a/lever10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lever10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Guy family: Lever editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 437-444 id: lever10a issued: date-parts: - 2010 - 3 - 31 firstpage: 437 lastpage: 444 published: 2010-03-31 00:00:00 +0000 - title: 'The Feature Selection Path in Kernel Methods' abstract: 'The problem of automatic feature selection/weighting in kernel methods is examined. We work on a formulation that optimizes both the weights of features and the parameters of the kernel model simultaneously, using $L_1$ regularization for feature selection. Under quite general choices of kernels, we prove that there exists a unique regularization path for this problem, that runs from 0 to a stationary point of the non-regularized problem. We propose an ODE-based homotopy method to follow this trajectory. By following the path, our algorithm is able to automatically discard irrelevant features and to automatically go back and forth to avoid local optima. Experiments on synthetic and real datasets show that the method achieves low prediction error and is efficient in separating relevant from irrelevant features.' volume: 9 URL: https://proceedings.mlr.press/v9/li10a.html PDF: http://proceedings.mlr.press/v9/li10a/li10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-li10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Fuxin family: Li - given: Cristian family: Sminchisescu editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 445-452 id: li10a issued: date-parts: - 2010 - 3 - 31 firstpage: 445 lastpage: 452 published: 2010-03-31 00:00:00 +0000 - title: 'Simple Exponential Family PCA' abstract: 'Bayesian principal component analysis (BPCA), a probabilistic reformulation of PCA with Bayesian model selection, is a systematic approach to determining the number of essential principal components (PCs) for data representation. However, it assumes that data are Gaussian distributed and thus it cannot handle all types of practical observations, e.g. integers and binary values. In this paper, we propose simple exponential family PCA (SePCA), a generalised family of probabilistic principal component analysers. SePCA employs exponential family distributions to handle general types of observations. By using Bayesian inference, SePCA also automatically discovers the number of essential PCs. We discuss techniques for fitting the model, develop the corresponding mixture model, and show the effectiveness of the model based on experiments.' volume: 9 URL: https://proceedings.mlr.press/v9/li10b.html PDF: http://proceedings.mlr.press/v9/li10b/li10b.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-li10b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Jun family: Li - given: Dacheng family: Tao editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 453-460 id: li10b issued: date-parts: - 2010 - 3 - 31 firstpage: 453 lastpage: 460 published: 2010-03-31 00:00:00 +0000 - title: 'The Group Dantzig Selector' abstract: 'We introduce a new method – the group Dantzig selector – for high dimensional sparse regression with group structure, which has a convincing theory about why utilizing the group structure can be beneficial. Under a group restricted isometry condition, we obtain a significantly improved nonasymptotic $\ell_2$-norm bound over the basis pursuit or the Dantzig selector which ignores the group structure. To gain more insight, we also introduce a surprisingly simple and intuitive “sparsity oracle condition” to obtain a block $\ell_1$-norm bound, which is easily accessible to a broad audience in machine learning community. Encouraging numerical results are also provided to support our theory.' volume: 9 URL: https://proceedings.mlr.press/v9/liu10a.html PDF: http://proceedings.mlr.press/v9/liu10a/liu10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-liu10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Han family: Liu - given: Jian family: Zhang - given: Xiaoye family: Jiang - given: Jun family: Liu editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 461-468 id: liu10a issued: date-parts: - 2010 - 3 - 31 firstpage: 461 lastpage: 468 published: 2010-03-31 00:00:00 +0000 - title: 'Descent Methods for Tuning Parameter Refinement' abstract: 'This paper addresses multidimensional tuning parameter selection in the context of “train-validate-test” and $K$-fold cross validation. A coarse grid search over tuning parameter space is used to initialize a descent method which then jointly optimizes over variables and tuning parameters. We study four regularized regression methods and develop the update equations for the corresponding descent algorithms. Experiments on both simulated and real-world datasets show that the method results in significant tuning parameter refinement.' volume: 9 URL: https://proceedings.mlr.press/v9/lorbert10a.html PDF: http://proceedings.mlr.press/v9/lorbert10a/lorbert10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lorbert10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Alexander family: Lorbert - given: Peter family: Ramadge editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 469-476 id: lorbert10a issued: date-parts: - 2010 - 3 - 31 firstpage: 469 lastpage: 476 published: 2010-03-31 00:00:00 +0000 - title: 'Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net' abstract: 'A new approach to regression regularization called the Pairwise Elastic Net is proposed. Like the Elastic Net, it simultaneously performs automatic variable selection and continuous shrinkage. In addition, the Pairwise Elastic Net encourages the grouping of strongly correlated predictors based on a pairwise similarity measure. We give examples of how the Pairwise Elastic Net can be used to achieve the objectives of Ridge regression, the Lasso, the Elastic Net, and Group Lasso. Finally, we present a coordinate descent algorithm to solve the Pairwise Elastic Net.' volume: 9 URL: https://proceedings.mlr.press/v9/lorbert10b.html PDF: http://proceedings.mlr.press/v9/lorbert10b/lorbert10b.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lorbert10b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Alexander family: Lorbert - given: David family: Eis - given: Victoria family: Kostina - given: David family: Blei - given: Peter family: Ramadge editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 477-484 id: lorbert10b issued: date-parts: - 2010 - 3 - 31 firstpage: 477 lastpage: 484 published: 2010-03-31 00:00:00 +0000 - title: 'Contextual Multi-Armed Bandits' abstract: 'We study contextual multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a contextual multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses, based on a given context (side information), an action from a set of possible actions so as to maximize the total payoff of the chosen actions. The payoff depends on both the action chosen and the context. In contrast, context-free multi-armed bandit problems, a focus of much previous research, model situations where no side information is available and the payoff depends only on the action chosen. Our problem is motivated by sponsored web search, where the task is to display ads to a user of an Internet search engine based on her search query so as to maximize the click-through rate (CTR) of the ads displayed. We cast this problem as a contextual multi-armed bandit problem where queries and ads form metric spaces and the payoff function is Lipschitz with respect to both the metrics. For any $\epsilon > 0$ we present an algorithm with regret $O(T^{\frac{a+b+1}{a+b+2} + \epsilon})$ where $a, b$ are the covering dimensions of the query space and the ad space respectively. We prove a lower bound $\Omega(T^{\frac{\tilde{a}+\tilde{b}+1}{\tilde{a}+\tilde{b}+2} - \epsilon})$ for the regret of any algorithm where $\tilde{a}, \tilde{b}$ are packing dimensions of the query spaces and the ad space respectively. For finite spaces or convex bounded subsets of Euclidean spaces, this gives an almost matching upper and lower bound.' volume: 9 URL: https://proceedings.mlr.press/v9/lu10a.html PDF: http://proceedings.mlr.press/v9/lu10a/lu10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lu10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Tyler family: Lu - given: David family: Pal - given: Martin family: Pal editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 485-492 id: lu10a issued: date-parts: - 2010 - 3 - 31 firstpage: 485 lastpage: 492 published: 2010-03-31 00:00:00 +0000 - title: 'Exploiting Feature Covariance in High-Dimensional Online Learning' abstract: 'Some online algorithms for linear classification model the uncertainty in their weights over the course of learning. Modeling the full covariance structure of the weights can provide a significant advantage for classification. However, for high-dimensional, large-scale data, even though there may be many second-order feature interactions, it is computationally infeasible to maintain this covariance structure. To extend second-order methods to high-dimensional data, we develop low-rank approximations of the covariance structure. We evaluate our approach on both synthetic and real-world data sets using the confidence-weighted online learning framework. We show improvements over diagonal covariance matrices for both low and high-dimensional data.' volume: 9 URL: https://proceedings.mlr.press/v9/ma10a.html PDF: http://proceedings.mlr.press/v9/ma10a/ma10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ma10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Justin family: Ma - given: Alex family: Kulesza - given: Mark family: Dredze - given: Koby family: Crammer - given: Lawrence family: Saul - given: Fernando family: Pereira editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 493-500 id: ma10a issued: date-parts: - 2010 - 3 - 31 firstpage: 493 lastpage: 500 published: 2010-03-31 00:00:00 +0000 - title: 'Supervised Dimension Reduction Using Bayesian Mixture Modeling' abstract: 'We develop a Bayesian framework for supervised dimension reduction using a flexible nonparametric Bayesian mixture modeling approach. Our method retrieves the dimension reduction or d.r. subspace by utilizing a dependent Dirichlet process that allows for natural clustering for the data in terms of both the response and predictor variables. Formal probabilistic models with likelihoods and priors are given and efficient posterior sampling of the d.r. subspace can be obtained by a Gibbs sampler. As the posterior draws are linear subspaces which are points on a Grassmann manifold, we output the posterior mean d.r. subspace with respect to geodesics on the Grassmannian. The utility of our approach is illustrated on a set of simulated and real examples. Some Key Words: supervised dimension reduction, inverse regression, Dirichlet process, factor models, Grassman manifold.' volume: 9 URL: https://proceedings.mlr.press/v9/mao10a.html PDF: http://proceedings.mlr.press/v9/mao10a/mao10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-mao10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Kai family: Mao - given: Feng family: Liang - given: Sayan family: Mukherjee editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 501-508 id: mao10a issued: date-parts: - 2010 - 3 - 31 firstpage: 501 lastpage: 508 published: 2010-03-31 00:00:00 +0000 - title: 'Inductive Principles for Restricted Boltzmann Machine Learning' abstract: 'Recent research has seen the proposal of several new inductive principles designed specifically to avoid the problems associated with maximum likelihood learning in models with intractable partition functions. In this paper, we study learning methods for binary restricted Boltzmann machines (RBMs) based on ratio matching and generalized score matching. We compare these new RBM learning methods to a range of existing learning methods including stochastic maximum likelihood, contrastive divergence, and pseudo-likelihood. We perform an extensive empirical evaluation across multiple tasks and data sets.' volume: 9 URL: https://proceedings.mlr.press/v9/marlin10a.html PDF: http://proceedings.mlr.press/v9/marlin10a/marlin10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-marlin10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Benjamin family: Marlin - given: Kevin family: Swersky - given: Bo family: Chen - given: Nando family: Freitas editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 509-516 id: marlin10a issued: date-parts: - 2010 - 3 - 31 firstpage: 509 lastpage: 516 published: 2010-03-31 00:00:00 +0000 - title: 'Parallelizable Sampling of Markov Random Fields' abstract: 'Markov Random Fields (MRFs) are an important class of probabilistic models which are used for density estimation, classification, denoising, and for constructing Deep Belief Networks. Every application of an MRF requires addressing its inference problem, which can be done using deterministic inference methods or using stochastic Markov Chain Monte Carlo methods. In this paper we introduce a new Markov Chain transition operator that updates all the variables of a pairwise MRF in parallel by using auxiliary Gaussian variables. The proposed MCMC operator is extremely simple to implement and to parallelize. This is achieved by a formal equivalence result between arbitrary pairwise MRFs and a particular type of Restricted Boltzmann Machine. This result also implies that the later can be learned in place of the former without any loss of modeling power, a possibility we explore in experiments.' volume: 9 URL: https://proceedings.mlr.press/v9/martens10a.html PDF: http://proceedings.mlr.press/v9/martens10a/martens10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-martens10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: James family: Martens - given: Ilya family: Sutskever editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 517-524 id: martens10a issued: date-parts: - 2010 - 3 - 31 firstpage: 517 lastpage: 524 published: 2010-03-31 00:00:00 +0000 - title: 'Exploiting Within-Clique Factorizations in Junction-Tree Algorithms' abstract: 'We show that the expected computational complexity of the Junction-Tree Algorithm for maximum a posteriori inference in graphical models can be improved. Our results apply whenever the potentials over maximal cliques of the triangulated graph are factored over subcliques. This is common in many real applications, as we illustrate with several examples. The new algorithms are easily implemented, and experiments show substantial speed-ups over the classical Junction-Tree Algorithm. This enlarges the class of models for which exact inference is efficient.' volume: 9 URL: https://proceedings.mlr.press/v9/mcauley10a.html PDF: http://proceedings.mlr.press/v9/mcauley10a/mcauley10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-mcauley10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Julian family: McAuley - given: Tiberio family: Caetano editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 525-532 id: mcauley10a issued: date-parts: - 2010 - 3 - 31 firstpage: 525 lastpage: 532 published: 2010-03-31 00:00:00 +0000 - title: 'Discriminative Topic Segmentation of Text and Speech' abstract: 'We explore automated discovery of topically-coherent segments in speech or text sequences. We give two new discriminative topic segmentation algorithms which employ a new measure of text similarity based on word co-occurrence. Both algorithms function by finding extrema in the similarity signal over the text, with the latter algorithm using a compact support-vector based description of a window of text or speech observations in word similarity space to overcome noise introduced by speech recognition errors and off-topic content. In experiments over speech and text news streams, we show that these algorithms outperform previous methods. We observe that topic segmentation of speech recognizer output is a more difficult problem than that of text streams; however, we demonstrate that by using a lattice of competing hypotheses rather than just the one-best hypothesis as input to the segmentation algorithm, the performance of the algorithm can be improved.' volume: 9 URL: https://proceedings.mlr.press/v9/mohri10a.html PDF: http://proceedings.mlr.press/v9/mohri10a/mohri10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-mohri10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Mehryar family: Mohri - given: Pedro family: Moreno - given: Eugene family: Weinstein editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 533-540 id: mohri10a issued: date-parts: - 2010 - 3 - 31 firstpage: 533 lastpage: 540 published: 2010-03-31 00:00:00 +0000 - title: 'Elliptical slice sampling' abstract: 'Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. We present a new Markov chain Monte Carlo algorithm for performing inference in models with multivariate Gaussian priors. Its key properties are: 1) it has simple, generic code applicable to many models, 2) it has no free parameters, 3) it works well for a variety of Gaussian process based models. These properties make our method ideal for use while model building, removing the need to spend time deriving and tuning updates for more complex algorithms.' volume: 9 URL: https://proceedings.mlr.press/v9/murray10a.html PDF: http://proceedings.mlr.press/v9/murray10a/murray10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-murray10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Iain family: Murray - given: Ryan family: Adams - given: David family: MacKay editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 541-548 id: murray10a issued: date-parts: - 2010 - 3 - 31 firstpage: 541 lastpage: 548 published: 2010-03-31 00:00:00 +0000 - title: 'Near-Optimal Evasion of Convex-Inducing Classifiers' abstract: 'Classifiers are often used to detect miscreant activities. We study how an adversary can efficiently query a classifier to elicit information that allows the adversary to evade detection at near-minimal cost. We generalize results of Lowd and Meek (2005) to convex-inducing classifiers. We present algorithms that construct undetected instances of near-minimal cost using only polynomially many queries in the dimension of the space and without reverse engineering the decision boundary.' volume: 9 URL: https://proceedings.mlr.press/v9/nelson10a.html PDF: http://proceedings.mlr.press/v9/nelson10a/nelson10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-nelson10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Blaine family: Nelson - given: Benjamin family: Rubinstein - given: Ling family: Huang - given: Anthony family: Joseph - given: Shing–hon family: Lau - given: Steven family: Lee - given: Satish family: Rao - given: Anthony family: Tran - given: Doug family: Tygar editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 549-556 id: nelson10a issued: date-parts: - 2010 - 3 - 31 firstpage: 549 lastpage: 556 published: 2010-03-31 00:00:00 +0000 - title: 'Incremental Sparsification for Real-time Online Model Learning' abstract: 'Online model learning in real-time is required by many applications, for example, robot tracking control. It poses a difficult problem, as fast and incremental online regression with large data sets is the essential component and cannot be realized by straightforward usage of off-the-shelf machine learning methods such as Gaussian process regression or support vector regression. In this paper, we propose a framework for online, incremental sparsification with a fixed budget designed for large scale real-time model learning. The proposed approach combines a sparsification method based on an independency measure with a large scale database. In combination with an incremental learning approach such as sequential support vector regression, we obtain a regression method which is applicable in real-time online learning. It exhibits competitive learning accuracy when compared with standard regression techniques. Implementation on a real robot emphasizes the applicability of the proposed approach in real-time online model learning for real world systems.' volume: 9 URL: https://proceedings.mlr.press/v9/nguyen_tuong10a.html PDF: http://proceedings.mlr.press/v9/nguyen_tuong10a/nguyen_tuong10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-nguyen_tuong10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Duy family: Nguyen–Tuong - given: Jan family: Peters editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 557-564 id: nguyen_tuong10a issued: date-parts: - 2010 - 3 - 31 firstpage: 557 lastpage: 564 published: 2010-03-31 00:00:00 +0000 - title: 'Fluid Dynamics Models for Low Rank Discriminant Analysis' abstract: 'We consider the problem of reducing the dimensionality of labeled data for classification. Unfortunately, the optimal approach of finding the low-dimensional projection with minimal Bayes classification error is intractable, so most standard algorithms optimize a tractable heuristic function in the projected subspace. Here, we investigate a physics-based model where we consider the labeled data as interacting fluid distributions. We derive the forces arising in the fluids from information theoretic potential functions, and consider appropriate low rank constraints on the resulting acceleration and velocity flow fields. We show how to apply the Gauss principle of least constraint in fluids to obtain tractable solutions for low rank projections. Our fluid dynamic approach is demonstrated to better approximate the Bayes optimal solution on Gaussian systems, including infinite dimensional Gaussian processes.' volume: 9 URL: https://proceedings.mlr.press/v9/noh10a.html PDF: http://proceedings.mlr.press/v9/noh10a/noh10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-noh10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Yung–Kyun family: Noh - given: Byoung–Tak family: Zhang - given: Daniel family: Lee editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 565-572 id: noh10a issued: date-parts: - 2010 - 3 - 31 firstpage: 565 lastpage: 572 published: 2010-03-31 00:00:00 +0000 - title: 'Approximation of hidden Markov models by mixtures of experts with application to particle filtering' abstract: 'Selecting conveniently the proposal kernel and the adjustment multiplier weights of the auxiliary particle filter may increase significantly the accuracy and computational efficiency of the method. However, in practice the optimal proposal kernel and multiplier weights are seldom known. In this paper we present a simulation-based method for constructing offline an approximation of these quantities that makes the filter close to fully adapted at a reasonable computational cost. The approximation is constructed as a mixture of experts optimised through an efficient stochastic approximation algorithm. The method is illustrated on two simulated examples.' volume: 9 URL: https://proceedings.mlr.press/v9/olsson10a.html PDF: http://proceedings.mlr.press/v9/olsson10a/olsson10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-olsson10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Jimmy family: Olsson - given: Jonas family: Ströjby editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 573-580 id: olsson10a issued: date-parts: - 2010 - 3 - 31 firstpage: 573 lastpage: 580 published: 2010-03-31 00:00:00 +0000 - title: 'A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection' abstract: 'We propose a generalization of the Multiple-try Metropolis (MTM) algorithm of Liu et al. (2000), which is based on drawing several proposals at each step and randomly choosing one of them on the basis of weights that may be arbitrary chosen. In particular, for Bayesian estimation we also introduce a method based on weights depending on a quadratic approximation of the posterior distribution. The resulting algorithm cannot be reformulated as an MTM algorithm and leads to a comparable gain of efficiency with a lower computational effort. We also outline the extension of the proposed strategy, and then of the MTM strategy, to Bayesian model selection, casting it in a Reversible Jump framework. The approach is illustrated by real examples.' volume: 9 URL: https://proceedings.mlr.press/v9/pandolfi10a.html PDF: http://proceedings.mlr.press/v9/pandolfi10a/pandolfi10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-pandolfi10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Silvia family: Pandolfi - given: Francesco family: Bartolucci - given: Nial family: Friel editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 581-588 id: pandolfi10a issued: date-parts: - 2010 - 3 - 31 firstpage: 581 lastpage: 588 published: 2010-03-31 00:00:00 +0000 - title: 'Bayesian structure discovery in Bayesian networks with less space' abstract: 'Current exact algorithms for score-based structure discovery in Bayesian networks on $n$ nodes run in time and space within a polynomial factor of $2^n$. For practical use, the space requirement is the bottleneck, which motivates trading space against time. Here, previous results on finding an optimal network structure in less space are extended in two directions. First, we consider the problem of computing the posterior probability of a given arc set. Second, we operate with the general partial order framework and its specialization to bucket orders, introduced recently for related permutation problems. The main technical contribution is the development of a fast algorithm for a novel zeta transform variant, which may be of independent interest.' volume: 9 URL: https://proceedings.mlr.press/v9/parviainen10a.html PDF: http://proceedings.mlr.press/v9/parviainen10a/parviainen10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-parviainen10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Pekka family: Parviainen - given: Mikko family: Koivisto editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 589-596 id: parviainen10a issued: date-parts: - 2010 - 3 - 31 firstpage: 589 lastpage: 596 published: 2010-03-31 00:00:00 +0000 - title: 'Identifying Cause and Effect on Discrete Data using Additive Noise Models' abstract: 'Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. Recently, methods using additive noise models have been suggested to approach the case of continuous variables. In many situations, however, the variables of interest are discrete or even have only finitely many states. In this work we extend the notion of additive noise models to these cases. Whenever the joint distribution $\mathbf{P}^{(X,Y)}$ admits such a model in one direction, e.g. $Y = f(X)+N$, $N \perp \!\!\! \perp X$, it does not admit the reversed model $X=g(Y)+\tilde{N}$, $\tilde{N} \perp \!\!\! \perp Y$ as long as the model is chosen in a generic way. Based on these deliberations we propose an efficient new algorithm that is able to distinguish between cause and effect for a finite sample of discrete variables. We show that this algorithm works both on synthetic and real data sets.' volume: 9 URL: https://proceedings.mlr.press/v9/peters10a.html PDF: http://proceedings.mlr.press/v9/peters10a/peters10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-peters10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Jonas family: Peters - given: Dominik family: Janzing - given: Bernhard family: Schölkopf editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 597-604 id: peters10a issued: date-parts: - 2010 - 3 - 31 firstpage: 597 lastpage: 604 published: 2010-03-31 00:00:00 +0000 - title: 'REGO: Rank-based Estimation of Renyi Information using Euclidean Graph Optimization' abstract: 'We propose a new method for a non-parametric estimation of Renyi and Shannon information for a multivariate distribution using a corresponding copula, a multivariate distribution over normalized ranks of the data. As the information of the distribution is the same as the negative entropy of its copula, our method estimates this information by solving a Euclidean graph optimization problem on the empirical estimate of the distribution’s copula. Owing to the properties of the copula, we show that the resulting estimator of Renyi information is strongly consistent and robust. Further, we demonstrate its applicability in the image registration in addition to simulated experiments.' volume: 9 URL: https://proceedings.mlr.press/v9/poczos10a.html PDF: http://proceedings.mlr.press/v9/poczos10a/poczos10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-poczos10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Barnabas family: Poczos - given: Sergey family: Kirshner - given: Csaba family: Szepesvári editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 605-612 id: poczos10a issued: date-parts: - 2010 - 3 - 31 firstpage: 605 lastpage: 612 published: 2010-03-31 00:00:00 +0000 - title: 'Infinite Predictor Subspace Models for Multitask Learning' abstract: 'Given several related learning tasks, we propose a nonparametric Bayesian model that captures task relatedness by assuming that the task parameters (i.e., predictors) share a latent subspace. More specifically, the intrinsic dimensionality of the task subspace is not assumed to be known a priori. We use an infinite latent feature model to automatically infer this number (depending on and limited by only the number of tasks). Furthermore, our approach is applicable when the underlying task parameter subspace is inherently sparse, drawing parallels with l1 regularization and LASSO-style models. We also propose an augmented model which can make use of (labeled, and additionally unlabeled if available) inputs to assist learning this subspace, leading to further improvements in the performance. Experimental results demonstrate the efficacy of both the proposed approaches, especially when the number of examples per task is small. Finally, we discuss an extension of the proposed framework where a nonparametric mixture of linear subspaces can be used to learn a manifold over the task parameters, and also deal with the issue of negative transfer from unrelated tasks.' volume: 9 URL: https://proceedings.mlr.press/v9/rai10a.html PDF: http://proceedings.mlr.press/v9/rai10a/rai10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-rai10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Piyush family: Rai - given: Hal family: Daumé suffix: III editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 613-620 id: rai10a issued: date-parts: - 2010 - 3 - 31 firstpage: 613 lastpage: 620 published: 2010-03-31 00:00:00 +0000 - title: 'Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images' abstract: 'Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The Gaussian-Binary RBMs that have been used to model real-valued data are not a good way to model the covariance structure of natural images. We propose a factored 3-way RBM that uses the states of its hidden units to represent abnormalities in the local covariance structure of an image. This provides a probabilistic framework for the widely used simple/complex cell architecture. Our model learns binary features that work very well for object recognition on the “tiny images” data set. Even better features are obtained by then using standard binary RBM’s to learn a deeper model.' volume: 9 URL: https://proceedings.mlr.press/v9/ranzato10a.html PDF: http://proceedings.mlr.press/v9/ranzato10a/ranzato10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ranzato10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Marc’Aurelio family: Ranzato - given: Alex family: Krizhevsky - given: Geoffrey family: Hinton editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 621-628 id: ranzato10a issued: date-parts: - 2010 - 3 - 31 firstpage: 621 lastpage: 628 published: 2010-03-31 00:00:00 +0000 - title: 'Nonparametric prior for adaptive sparsity' abstract: 'For high-dimensional problems various parametric priors have been proposed to promote sparse solutions. While parametric priors has shown considerable success they are not very robust in adapting to varying degrees of sparsity. In this work we propose a discrete mixture prior which is partially nonparametric. The right structure for the prior and the amount of sparsity is estimated directly from the data. Our experiments show that the proposed prior adapts to sparsity much better than its parametric counterparts. We apply the proposed method to classification of high dimensional microarray datasets.' volume: 9 URL: https://proceedings.mlr.press/v9/raykar10a.html PDF: http://proceedings.mlr.press/v9/raykar10a/raykar10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-raykar10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Vikas family: Raykar - given: Linda family: Zhao editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 629-636 id: raykar10a issued: date-parts: - 2010 - 3 - 31 firstpage: 629 lastpage: 636 published: 2010-03-31 00:00:00 +0000 - title: 'Convexity of Proper Composite Binary Losses' abstract: 'A composite loss assigns a penalty to a real-valued prediction by associating the prediction with a probability via a link function then applying a class probability estimation (CPE) loss. If the risk for a composite loss is always minimised by predicting the value associated with the true class probability the composite loss is proper. We provide a novel, explicit and complete characterisation of the convexity of any proper composite loss in terms of its link and its “weight function” associated with its proper CPE loss.' volume: 9 URL: https://proceedings.mlr.press/v9/reid10a.html PDF: http://proceedings.mlr.press/v9/reid10a/reid10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-reid10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Mark family: Reid - given: Robert family: Williamson editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 637-644 id: reid10a issued: date-parts: - 2010 - 3 - 31 firstpage: 637 lastpage: 644 published: 2010-03-31 00:00:00 +0000 - title: 'Gaussian processes with monotonicity information' abstract: 'A method for using monotonicity information in multivariate Gaussian process regression and classification is proposed. Monotonicity information is introduced with virtual derivative observations, and the resulting posterior is approximated with expectation propagation. Behaviour of the method is illustrated with artificial regression examples, and the method is used in a real world health care classification problem to include monotonicity information with respect to one of the covariates.' volume: 9 URL: https://proceedings.mlr.press/v9/riihimaki10a.html PDF: http://proceedings.mlr.press/v9/riihimaki10a/riihimaki10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-riihimaki10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Jaakko family: Riihimäki - given: Aki family: Vehtari editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 645-652 id: riihimaki10a issued: date-parts: - 2010 - 3 - 31 firstpage: 645 lastpage: 652 published: 2010-03-31 00:00:00 +0000 - title: 'A Regularization Approach to Nonlinear Variable Selection' abstract: 'In this paper we consider a regularization approach to variable selection when the regression function depends nonlinearly on a few input variables. The proposed method is based on a regularized least square estimator penalizing large values of the partial derivatives. An efficient iterative procedure is proposed to solve the underlying variational problem, and its convergence is proved. The empirical properties of the obtained estimator are tested both for prediction and variable selection. The algorithm compares favorably to more standard ridge regression and L1 regularization schemes.' volume: 9 URL: https://proceedings.mlr.press/v9/rosasco10a.html PDF: http://proceedings.mlr.press/v9/rosasco10a/rosasco10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-rosasco10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Lorenzo family: Rosasco - given: Matteo family: Santoro - given: Sofia family: Mosci - given: Alessandro family: Verri - given: Silvia family: Villa editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 653-660 id: rosasco10a issued: date-parts: - 2010 - 3 - 31 firstpage: 653 lastpage: 660 published: 2010-03-31 00:00:00 +0000 - title: 'Efficient Reductions for Imitation Learning' abstract: 'Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.' volume: 9 URL: https://proceedings.mlr.press/v9/ross10a.html PDF: http://proceedings.mlr.press/v9/ross10a/ross10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ross10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Stephane family: Ross - given: Drew family: Bagnell editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 661-668 id: ross10a issued: date-parts: - 2010 - 3 - 31 firstpage: 661 lastpage: 668 published: 2010-03-31 00:00:00 +0000 - title: 'Approximate parameter inference in a stochastic reaction-diffusion model' abstract: 'We present an approximate inference approach to parameter estimation in a spatio-temporal stochastic process of the reaction-diffusion type. The continuous space limit of an inference method for Markov jump processes leads to an approximation which is related to a spatial Gaussian process. An efficient solution in feature space using a Fourier basis is applied to inference on simulational data.' volume: 9 URL: https://proceedings.mlr.press/v9/ruttor10a.html PDF: http://proceedings.mlr.press/v9/ruttor10a/ruttor10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ruttor10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Andreas family: Ruttor - given: Manfred family: Opper editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 669-676 id: ruttor10a issued: date-parts: - 2010 - 3 - 31 firstpage: 669 lastpage: 676 published: 2010-03-31 00:00:00 +0000 - title: 'Active Sequential Learning with Tactile Feedback' abstract: 'We consider the problem of tactile discrimination, with the goal of estimating an underlying state parameter in a sequential setting. If the data is continuous and high-dimensional, collecting enough representative data samples becomes difficult. We present a framework that uses active learning to help with the sequential gathering of data samples, using information-theoretic criteria to find optimal actions at each time step. We consider two approaches to recursively update the state parameter belief: an analytical Gaussian approximation and a Monte Carlo sampling method. We show how both active frameworks improve convergence, demonstrating results on a real robotic hand-arm system that estimates the viscosity of liquids from tactile feedback data.' volume: 9 URL: https://proceedings.mlr.press/v9/saal10a.html PDF: http://proceedings.mlr.press/v9/saal10a/saal10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-saal10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Hannes family: Saal - given: Jo–Anne family: Ting - given: Sethu family: Vijayakumar editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 677-684 id: saal10a issued: date-parts: - 2010 - 3 - 31 firstpage: 677 lastpage: 684 published: 2010-03-31 00:00:00 +0000 - title: 'Reducing Label Complexity by Learning From Bags' abstract: 'We consider a supervised learning setting in which the main cost of learning is the number of training labels and one can obtain a single label for a bag of examples, indicating only if a positive example exists in the bag, as in Multi-Instance Learning. We thus propose to create a training sample of bags, and to use the obtained labels to learn to classify individual examples. We provide a theoretical analysis showing how to select the bag size as a function of the problem parameters, and prove that if the original labels are distributed unevenly, the number of required labels drops considerably when learning from bags. We demonstrate that finding a low-error separating hyperplane from bags is feasible in this setting using a simple iterative procedure similar to latent SVM. Experiments on synthetic and real data sets demonstrate the success of the approach.' volume: 9 URL: https://proceedings.mlr.press/v9/sabato10a.html PDF: http://proceedings.mlr.press/v9/sabato10a/sabato10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sabato10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Sivan family: Sabato - given: Nathan family: Srebro - given: Naftali family: Tishby editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 685-692 id: sabato10a issued: date-parts: - 2010 - 3 - 31 firstpage: 685 lastpage: 692 published: 2010-03-31 00:00:00 +0000 - title: 'Efficient Learning of Deep Boltzmann Machines' abstract: 'We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model that is used to quickly initialize, in a single bottom-up pass, the values of the latent variables in all hidden layers. We show that using such a recognition model, followed by a combined top-down and bottom-up pass, it is possible to efficiently learn a good generative model of high-dimensional highly-structured sensory input. We show that the additional computations required by incorporating a top-down feedback plays a critical role in the performance of a DBM, both as a generative and discriminative model. Moreover, inference is only at most three times slower compared to the approximate inference in a Deep Belief Network (DBN), making large-scale learning of DBM’s practical. Finally, we demonstrate that the DBM’s trained using the proposed approximate inference algorithm perform well compared to DBN’s and SVM’s on the MNIST handwritten digit, OCR English letters, and NORB visual object recognition tasks.' volume: 9 URL: https://proceedings.mlr.press/v9/salakhutdinov10a.html PDF: http://proceedings.mlr.press/v9/salakhutdinov10a/salakhutdinov10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-salakhutdinov10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Ruslan family: Salakhutdinov - given: Hugo family: Larochelle editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 693-700 id: salakhutdinov10a issued: date-parts: - 2010 - 3 - 31 firstpage: 693 lastpage: 700 published: 2010-03-31 00:00:00 +0000 - title: 'Factorized Orthogonal Latent Spaces' abstract: 'Existing approaches to multi-view learning are particularly effective when the views are either independent (i.e, multi-kernel approaches) or fully dependent (i.e., shared latent spaces). However, in real scenarios, these assumptions are almost never truly satisfied. Recently, two methods have attempted to tackle this problem by factorizing the information and learn separate latent spaces for modeling the shared (i.e., correlated) and private (i.e., independent) parts of the data. However, these approaches are very sensitive to parameters setting or initialization. In this paper we propose a robust approach to factorizing the latent space into shared and private spaces by introducing orthogonality constraints, which penalize redundant latent representations. Furthermore, unlike previous approaches, we simultaneously learn the structure and dimensionality of the latent spaces by relying on a regularizer that encourages the latent space of each data stream to be low dimensional. To demonstrate the benefits of our approach, we apply it to two existing shared latent space models that assume full dependence of the views, the sGPLVM and the sKIE, and show that our constraints improve the performance of these models on the task of pose estimation from monocular images.' volume: 9 URL: https://proceedings.mlr.press/v9/salzmann10a.html PDF: http://proceedings.mlr.press/v9/salzmann10a/salzmann10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-salzmann10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Mathieu family: Salzmann - given: Carl Henrik family: Ek - given: Raquel family: Urtasun - given: Trevor family: Darrell editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 701-708 id: salzmann10a issued: date-parts: - 2010 - 3 - 31 firstpage: 701 lastpage: 708 published: 2010-03-31 00:00:00 +0000 - title: 'Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials' abstract: 'Previous work has examined structure learning in log-linear models with $\ell_1$-regularization, largely focusing on the case of pairwise potentials. In this work we consider the case of models with potentials of arbitrary order, but that satisfy a hierarchical constraint. We enforce the hierarchical constraint using group $\ell_1$-regularization with overlapping groups, and an active set method that enforces hierarchical inclusion allows us to tractably consider the exponential number of higher-order potentials. We use a spectral projected gradient method as a sub-routine for solving the overlapping group $\ell_1$-regularization problem, and make use of a sparse version of Dykstra’s algorithm to compute the projection. Our experiments indicate that this model gives equal or better test set likelihood compared to previous models.' volume: 9 URL: https://proceedings.mlr.press/v9/schmidt10a.html PDF: http://proceedings.mlr.press/v9/schmidt10a/schmidt10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-schmidt10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Mark family: Schmidt - given: Kevin family: Murphy editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 709-716 id: schmidt10a issued: date-parts: - 2010 - 3 - 31 firstpage: 709 lastpage: 716 published: 2010-03-31 00:00:00 +0000 - title: 'Polynomial-Time Exact Inference in NP-Hard Binary MRFs via Reweighted Perfect Matching' abstract: 'We develop a new form of reweighting (Wainwright et al., 2005b) to leverage the relationship between Ising spin glasses and perfect matchings into a novel technique for the exact computation of MAP states in hitherto intractable binary Markov random fields. Our method solves an $n \times n$ lattice with external field and random couplings much faster, and for larger $n$, than the best competing algorithms. It empirically scales as $O(n^3)$ even though this problem is NP-hard and non-approximable in polynomial time. We discuss limitations of our current implementation and propose ways to overcome them.' volume: 9 URL: https://proceedings.mlr.press/v9/schraudolph10a.html PDF: http://proceedings.mlr.press/v9/schraudolph10a/schraudolph10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-schraudolph10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Nic family: Schraudolph editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 717-724 id: schraudolph10a issued: date-parts: - 2010 - 3 - 31 firstpage: 717 lastpage: 724 published: 2010-03-31 00:00:00 +0000 - title: 'Dense Message Passing for Sparse Principal Component Analysis' abstract: 'We describe a novel inference algorithm for sparse Bayesian PCA with a zero-norm prior on the model parameters. Bayesian inference is very challenging in probabilistic models of this type. MCMC procedures are too slow to be practical in a very high-dimensional setting and standard mean-field variational Bayes algorithms are ineffective. We adopt a dense message passing algorithm similar to algorithms developed in the statistical physics community and previously applied to inference problems in coding and sparse classification. The algorithm achieves near-optimal performance on synthetic data for which a statistical mechanics theory of optimal learning can be derived. We also study two gene expression datasets used in previous studies of sparse PCA. We find our method performs better than one published algorithm and comparably to a second.' volume: 9 URL: https://proceedings.mlr.press/v9/sharp10a.html PDF: http://proceedings.mlr.press/v9/sharp10a/sharp10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sharp10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Kevin family: Sharp - given: Magnus family: Rattray editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 725-732 id: sharp10a issued: date-parts: - 2010 - 3 - 31 firstpage: 725 lastpage: 732 published: 2010-03-31 00:00:00 +0000 - title: 'Empirical Bernstein Boosting' abstract: 'Concentration inequalities that incorporate variance information (such as Bernstein’s or Bennett’s inequality) are often significantly tighter than counterparts (such as Hoeffding’s inequality) that disregard variance. Nevertheless, many state of the art machine learning algorithms for classification problems like AdaBoost and support vector machines (SVMs) extensively use Hoeffding’s inequalities to justify empirical risk minimization and its variants. This article proposes a novel boosting algorithm based on a recently introduced principle–sample variance penalization–which is motivated from an empirical version of Bernstein’s inequality. This framework leads to an efficient algorithm that is as easy to implement as AdaBoost while producing a strict generalization. Experiments on a large number of datasets show significant performance gains over AdaBoost. This paper shows that sample variance penalization could be a viable alternative to empirical risk minimization.' volume: 9 URL: https://proceedings.mlr.press/v9/shivaswamy10a.html PDF: http://proceedings.mlr.press/v9/shivaswamy10a/shivaswamy10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-shivaswamy10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Pannagadatta family: Shivaswamy - given: Tony family: Jebara editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 733-740 id: shivaswamy10a issued: date-parts: - 2010 - 3 - 31 firstpage: 733 lastpage: 740 published: 2010-03-31 00:00:00 +0000 - title: 'Reduced-Rank Hidden Markov Models' abstract: 'Hsu et al. (2009) recently proposed an efficient, accurate spectral learning algorithm for Hidden Markov Models (HMMs). In this paper we relax their assumptions and prove a tighter finite-sample error bound for the case of Reduced-Rank HMMs, i.e., HMMs with low-rank transition matrices. Since rank-$k$ RR-HMMs are a larger class of models than $k$-state HMMs while being equally efficient to work with, this relaxation greatly increases the learning algorithm’s scope. In addition, we generalize the algorithm and bounds to models where multiple observations are needed to disambiguate state, and to models that emit multivariate real-valued observations. Finally we prove consistency for learning Predictive State Representations, an even larger class of models. Experiments on synthetic data and a toy video, as well as on difficult robot vision data, yield accurate models that compare favorably with alternatives in simulation quality and prediction accuracy.' volume: 9 URL: https://proceedings.mlr.press/v9/siddiqi10a.html PDF: http://proceedings.mlr.press/v9/siddiqi10a/siddiqi10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-siddiqi10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Sajid family: Siddiqi - given: Byron family: Boots - given: Geoffrey family: Gordon editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 741-748 id: siddiqi10a issued: date-parts: - 2010 - 3 - 31 firstpage: 741 lastpage: 748 published: 2010-03-31 00:00:00 +0000 - title: 'Detecting Weak but Hierarchically-Structured Patterns in Networks' abstract: 'The ability to detect weak distributed activation patterns in networks is critical to several applications, such as identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network. This is a challenging problem since weak distributed patterns can be invisible in per node statistics as well as a global network-wide aggregate. Most prior work considers situations in which the activation/non-activation of each node is statistically independent, but this is unrealistic in many problems. In this paper, we consider structured patterns arising from statistical dependencies in the activation process. Our contributions are three-fold. First, we propose a sparsifying transform that succinctly represents structured activation patterns that conform to a hierarchical dependency graph. Second, we establish that the proposed transform facilitates detection of very weak activation patterns that cannot be detected with existing methods. Third, we show that the structure of the hierarchical dependency graph governing the activation process, and hence the network transform, can be learnt from very few (logarithmic in network size) independent snapshots of network activity.' volume: 9 URL: https://proceedings.mlr.press/v9/singh10a.html PDF: http://proceedings.mlr.press/v9/singh10a/singh10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-singh10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Aarti family: Singh - given: Robert family: Nowak - given: Robert family: Calderbank editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 749-756 id: singh10a issued: date-parts: - 2010 - 3 - 31 firstpage: 749 lastpage: 756 published: 2010-03-31 00:00:00 +0000 - title: 'Inference of Sparse Networks with Unobserved Variables. Application to Gene Regulatory Networks' abstract: 'Networks are becoming a unifying framework for modeling complex systems and network inference problems are frequently encountered in many fields. Here, I develop and apply a generative approach to network inference (RCweb) for the case when the network is sparse and the latent (not observed) variables affect the observed ones. From all possible factor analysis (FA) decompositions explaining the variance in the data, RCweb selects the FA decomposition that is consistent with a sparse underlying network. The sparsity constraint is imposed by a novel method that significantly outperforms (in terms of accuracy, robustness to noise, complexity scaling and computational efficiency) methods using $\ell 1$ norm relaxation such as K-SVD and $\ell 1$-based sparse principle component analysis (PCA). Results from simulated models demonstrate that RCweb recovers exactly the model structures for sparsity as low (as non-sparse) as 50% and with ratio of unobserved to observed variables as high as 2. RCweb is robust to noise, with gradual decrease in the parameter ranges as the noise level increases.' volume: 9 URL: https://proceedings.mlr.press/v9/slavov10a.html PDF: http://proceedings.mlr.press/v9/slavov10a/slavov10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-slavov10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Nikolai family: Slavov editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 757-764 id: slavov10a issued: date-parts: - 2010 - 3 - 31 firstpage: 757 lastpage: 764 published: 2010-03-31 00:00:00 +0000 - title: 'Nonparametric Tree Graphical Models' abstract: 'We introduce a nonparametric representation for graphical model on trees which expresses marginals as Hilbert space embeddings and conditionals as embedding operators. This formulation allows us to define a graphical model solely on the basis of the feature space representation of its variables. Thus, this nonparametric model can be applied to general domains where kernels are defined, handling challenging cases such as discrete variables whose domains are huge, or very complex, non-Gaussian continuous distributions. We also derive kernel belief propagation, a Hilbert-space algorithm for performing inference in our model. We show that our method outperforms state-of-the-art techniques in a cross-lingual document retrieval task and a camera rotation estimation problem.' volume: 9 URL: https://proceedings.mlr.press/v9/song10a.html PDF: http://proceedings.mlr.press/v9/song10a/song10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-song10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Le family: Song - given: Arthur family: Gretton - given: Carlos family: Guestrin editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 765-772 id: song10a issued: date-parts: - 2010 - 3 - 31 firstpage: 765 lastpage: 772 published: 2010-03-31 00:00:00 +0000 - title: 'On the relation between universality, characteristic kernels and RKHS embedding of measures' abstract: 'Universal kernels have been shown to play an important role in the achievability of the Bayes risk by many kernel-based algorithms that include binary classification, regression, etc. In this paper, we propose a notion of universality that generalizes the notions introduced by Steinwart and Micchelli et al. and study the necessary and sufficient conditions for a kernel to be universal. We show that all these notions of universality are closely linked to the injective embedding of a certain class of Borel measures into a reproducing kernel Hilbert space (RKHS). By exploiting this relation between universality and the embedding of Borel measures into an RKHS, we establish the relation between universal and characteristic kernels. The latter have been proposed in the context of the RKHS embedding of probability measures, used in statistical applications like homogeneity testing, independence testing, etc.' volume: 9 URL: https://proceedings.mlr.press/v9/sriperumbudur10a.html PDF: http://proceedings.mlr.press/v9/sriperumbudur10a/sriperumbudur10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sriperumbudur10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Bharath family: Sriperumbudur - given: Kenji family: Fukumizu - given: Gert family: Lanckriet editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 773-780 id: sriperumbudur10a issued: date-parts: - 2010 - 3 - 31 firstpage: 773 lastpage: 780 published: 2010-03-31 00:00:00 +0000 - title: 'Conditional Density Estimation via Least-Squares Density Ratio Estimation' abstract: 'Estimating the conditional mean of an input-output relation is the goal of regression. However, regression analysis is not sufficiently informative if the conditional distribution has multi-modality, is highly asymmetric, or contains heteroscedastic noise. In such scenarios, estimating the conditional distribution itself would be more useful. In this paper, we propose a novel method of conditional density estimation that is suitable for multi-dimensional continuous variables. The basic idea of the proposed method is to express the conditional density in terms of the density ratio and the ratio is directly estimated without going through density estimation. Experiments using benchmark and robot transition datasets illustrate the usefulness of the proposed approach.' volume: 9 URL: https://proceedings.mlr.press/v9/sugiyama10a.html PDF: http://proceedings.mlr.press/v9/sugiyama10a/sugiyama10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sugiyama10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Masashi family: Sugiyama - given: Ichiro family: Takeuchi - given: Taiji family: Suzuki - given: Takafumi family: Kanamori - given: Hirotaka family: Hachiya - given: Daisuke family: Okanohara editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 781-788 id: sugiyama10a issued: date-parts: - 2010 - 3 - 31 firstpage: 781 lastpage: 788 published: 2010-03-31 00:00:00 +0000 - title: 'On the Convergence Properties of Contrastive Divergence' abstract: 'Contrastive Divergence (CD) is a popular method for estimating the parameters of Markov Random Fields (MRFs) by rapidly approximating an intractable term in the gradient of the log probability. Despite CD’s empirical success, little is known about its theoretical convergence properties. In this paper, we analyze the CD$_1$ update rule for Restricted Boltzmann Machines (RBMs) with binary variables. We show that this update is not the gradient of any function, and construct a counterintuitive “regularization function” that causes CD learning to cycle indefinitely. Nonetheless, we show that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.' volume: 9 URL: https://proceedings.mlr.press/v9/sutskever10a.html PDF: http://proceedings.mlr.press/v9/sutskever10a/sutskever10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sutskever10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Ilya family: Sutskever - given: Tijmen family: Tieleman editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 789-795 id: sutskever10a issued: date-parts: - 2010 - 3 - 31 firstpage: 789 lastpage: 795 published: 2010-03-31 00:00:00 +0000 - title: 'Inference and Learning in Networks of Queues' abstract: 'Probabilistic models of the performance of computer systems are useful both for predicting system performance in new conditions, and for diagnosing past performance problems. The most popular performance models are networks of queues. However, no current methods exist for parameter estimation or inference in networks of queues with missing data. In this paper, we present a novel viewpoint that combines queueing networks and graphical models, allowing Markov chain Monte Carlo to be applied. We demonstrate the effectiveness of our sampler on real-world data from a benchmark Web application.' volume: 9 URL: https://proceedings.mlr.press/v9/sutton10a.html PDF: http://proceedings.mlr.press/v9/sutton10a/sutton10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sutton10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Charles family: Sutton - given: Michael I. family: Jordan editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 796-803 id: sutton10a issued: date-parts: - 2010 - 3 - 31 firstpage: 796 lastpage: 803 published: 2010-03-31 00:00:00 +0000 - title: 'Sufficient Dimension Reduction via Squared-loss Mutual Information Estimation' abstract: 'The goal of sufficient dimension reduction in supervised learning is to find the low dimensional subspace of input features that is "sufficient" for predicting output values. In this paper, we propose a novel sufficient dimension reduction method using a squared-loss variant of mutual information as a dependency measure. We utilize an analytic approximator of squared-loss mutual information based on density ratio estimation, which is shown to possess suitable convergence properties. We then develop a natural gradient algorithm for sufficient subspace search. Numerical experiments show that the proposed method compares favorably with existing dimension reduction approaches.' volume: 9 URL: https://proceedings.mlr.press/v9/suzuki10a.html PDF: http://proceedings.mlr.press/v9/suzuki10a/suzuki10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-suzuki10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Taiji family: Suzuki - given: Masashi family: Sugiyama editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 804-811 id: suzuki10a issued: date-parts: - 2010 - 3 - 31 firstpage: 804 lastpage: 811 published: 2010-03-31 00:00:00 +0000 - title: 'HOP-MAP: Efficient Message Passing with High Order Potentials' abstract: 'There is a growing interest in building probabilistic models with high order potentials (HOPs), or interactions, among discrete variables. Message passing inference in such models generally takes time exponential in the size of the interaction, but in some cases maximum a posteriori (MAP) inference can be carried out efficiently. We build upon such results, introducing two new classes, including composite HOPs that allow us to flexibly combine tractable HOPs using simple logical switching rules. We present efficient message update algorithms for the new HOPs, and we improve upon the efficiency of message updates for a general class of existing HOPs. Importantly, we present both new and existing HOPs in a common representation; performing inference with any combination of these HOPs requires no change of representations or new derivations.' volume: 9 URL: https://proceedings.mlr.press/v9/tarlow10a.html PDF: http://proceedings.mlr.press/v9/tarlow10a/tarlow10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-tarlow10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Daniel family: Tarlow - given: Inmar family: Givoni - given: Richard family: Zemel editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 812-819 id: tarlow10a issued: date-parts: - 2010 - 3 - 31 firstpage: 812 lastpage: 819 published: 2010-03-31 00:00:00 +0000 - title: 'Hartigan’s Method: k-means Clustering without Voronoi' abstract: 'Hartigan’s method for $k$-means clustering is the following greedy heuristic: select a point, and optimally reassign it. This paper develops two other formulations of the heuristic, one leading to a number of consistency properties, the other showing that the data partition is always quite separated from the induced Voronoi partition. A characterization of the volume of this separation is provided. Empirical tests verify not only good optimization performance relative to Lloyd’s method, but also good running time.' volume: 9 URL: https://proceedings.mlr.press/v9/telgarsky10a.html PDF: http://proceedings.mlr.press/v9/telgarsky10a/telgarsky10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-telgarsky10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Matus family: Telgarsky - given: Andrea family: Vattani editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 820-827 id: telgarsky10a issued: date-parts: - 2010 - 3 - 31 firstpage: 820 lastpage: 827 published: 2010-03-31 00:00:00 +0000 - title: 'Learning Policy Improvements with Path Integrals' abstract: 'With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parametrized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. We believe that Policy Improvement with Path Integrals (PI$^2$) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.' volume: 9 URL: https://proceedings.mlr.press/v9/theodorou10a.html PDF: http://proceedings.mlr.press/v9/theodorou10a/theodorou10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-theodorou10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Evangelos family: Theodorou - given: Jonas family: Buchli - given: Stefan family: Schaal editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 828-835 id: theodorou10a issued: date-parts: - 2010 - 3 - 31 firstpage: 828 lastpage: 835 published: 2010-03-31 00:00:00 +0000 - title: 'Unsupervised Aggregation for Classification Problems with Large Numbers of Categories' abstract: 'Classification problems with a very large or unbounded set of output categories are common in many areas such as natural language and image processing. In order to improve accuracy on these tasks, it is natural for a decision-maker to combine predictions from various sources. However, supervised data needed to fit an aggregation model is often difficult to obtain, especially if needed for multiple domains. Therefore, we propose a generative model for unsupervised aggregation which exploits the agreement signal to estimate the expertise of individual judges. Due to the large output space size, this aggregation model cannot encode expertise of constituent judges with respect to every category for all problems. Consequently, we extend it by incorporating the notion of category types to account for variability of the judge expertise depending on the type. The viability of our approach is demonstrated both on synthetic experiments and on a practical task of syntactic parser aggregation.' volume: 9 URL: https://proceedings.mlr.press/v9/titov10a.html PDF: http://proceedings.mlr.press/v9/titov10a/titov10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-titov10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Ivan family: Titov - given: Alexandre family: Klementiev - given: Kevin family: Small - given: Dan family: Roth editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 836-843 id: titov10a issued: date-parts: - 2010 - 3 - 31 firstpage: 836 lastpage: 843 published: 2010-03-31 00:00:00 +0000 - title: 'Bayesian Gaussian Process Latent Variable Model' abstract: 'We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.' volume: 9 URL: https://proceedings.mlr.press/v9/titsias10a.html PDF: http://proceedings.mlr.press/v9/titsias10a/titsias10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-titsias10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Michalis family: Titsias - given: Neil D. family: Lawrence editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 844-851 id: titsias10a issued: date-parts: - 2010 - 3 - 31 firstpage: 844 lastpage: 851 published: 2010-03-31 00:00:00 +0000 - title: 'A Markov-Chain Monte Carlo Approach to Simultaneous Localization and Mapping' abstract: 'A Markov-Chain Monte Carlo based algorithm is provided to solve the simultaneous localization and mapping (SLAM) problem with general dynamical and observation models under open-loop control and provided that the map-representation is finite dimensional. To our knowledge this is the first provably consistent yet (close-to) practical solution to this problem. The superiority of our algorithm over alternative SLAM algorithms is demonstrated in a difficult loop closing situation.' volume: 9 URL: https://proceedings.mlr.press/v9/torma10a.html PDF: http://proceedings.mlr.press/v9/torma10a/torma10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-torma10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Peter family: Torma - given: András family: György - given: Csaba family: Szepesvári editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 852-859 id: torma10a issued: date-parts: - 2010 - 3 - 31 firstpage: 852 lastpage: 859 published: 2010-03-31 00:00:00 +0000 - title: 'Learning Causal Structure from Overlapping Variable Sets' abstract: 'We present an algorithm name cSAT+ for learning the causal structure in a domain from datasets measuring different variables sets. The algorithm outputs a graph with edges corresponding to all possible pairwise causal relations between two variables, named Pairwise Causal Graph (PCG). Examples of interesting inferences include the induction of the absence or presence of some causal relation between two variables never measured together. cSAT+ converts the problem to a series of SAT problems, obtaining leverage from the efficiency of state-of-the-art solvers. In our empirical evaluation, it is shown to outperform ION, the first algorithm solving a similar but more general problem, by two orders of magnitude.' volume: 9 URL: https://proceedings.mlr.press/v9/triantafillou10a.html PDF: http://proceedings.mlr.press/v9/triantafillou10a/triantafillou10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-triantafillou10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Sofia family: Triantafillou - given: Ioannis family: Tsamardinos - given: Ioannis family: Tollis editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 860-867 id: triantafillou10a issued: date-parts: - 2010 - 3 - 31 firstpage: 860 lastpage: 867 published: 2010-03-31 00:00:00 +0000 - title: 'State-Space Inference and Learning with Gaussian Processes' abstract: 'State-space inference and learning with Gaussian processes (GPs) is an unsolved problem. We propose a new, general methodology for inference and learning in nonlinear state-space models that are described probabilistically by non-parametric GP models. We apply the expectation maximization algorithm to iterate between inference in the latent state-space and learning the parameters of the underlying GP dynamics model.' volume: 9 URL: https://proceedings.mlr.press/v9/turner10a.html PDF: http://proceedings.mlr.press/v9/turner10a/turner10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-turner10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Ryan family: Turner - given: Marc family: Deisenroth - given: Carl family: Rasmussen editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 868-875 id: turner10a issued: date-parts: - 2010 - 3 - 31 firstpage: 868 lastpage: 875 published: 2010-03-31 00:00:00 +0000 - title: 'Sequential Monte Carlo Samplers for Dirichlet Process Mixtures' abstract: 'In this paper, we develop a novel online algorithm based on the Sequential Monte Carlo(SMC) samplers framework for posterior inference in Dirichlet Process Mixtures (DPM). Our method generalizes many sequential importance sampling approaches. It provides a computationally efficient improvement to particle filtering that is less prone to getting stuck in isolated modes. The proposed method is a particular SMC sampler that enables us to design sophisticated clustering update schemes, such as updating past trajectories of the particles in light of recent observations, and still ensures convergence to the true DPM target distribution asymptotically. Performance has been evaluated in a Bayesian Infinite Gaussian mixture density estimation problem and it is shown that the proposed algorithm outperforms conventional Monte Carlo approaches in terms of estimation variance and average log-marginal likelihood.' volume: 9 URL: https://proceedings.mlr.press/v9/ulker10a.html PDF: http://proceedings.mlr.press/v9/ulker10a/ulker10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ulker10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Yener family: Ulker - given: Bilge family: Günsel - given: Taylan family: Cemgil editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 876-883 id: ulker10a issued: date-parts: - 2010 - 3 - 31 firstpage: 876 lastpage: 883 published: 2010-03-31 00:00:00 +0000 - title: 'Guarantees for Approximate Incremental SVMs' abstract: 'Assume a teacher provides examples one by one. An approximate incremental SVM computes a sequence of classifiers that are close to the true SVM solutions computed on the successive incremental training sets. We show that simple algorithms can satisfy an averaged accuracy criterion with a computational cost that scales as well as the best SVM algorithms with the number of examples. Finally, we exhibit some experiments highlighting the benefits of joining fast incremental optimization and curriculum and active learning (Schon and Cohn, 2000; Bordes et al., 2005; Bengio et al., 2009).' volume: 9 URL: https://proceedings.mlr.press/v9/usunier10a.html PDF: http://proceedings.mlr.press/v9/usunier10a/usunier10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-usunier10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Nicolas family: Usunier - given: Antoine family: Bordes - given: Léon family: Bottou editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 884-891 id: usunier10a issued: date-parts: - 2010 - 3 - 31 firstpage: 884 lastpage: 891 published: 2010-03-31 00:00:00 +0000 - title: 'An Alternative Prior Process for Nonparametric Bayesian Clustering' abstract: 'Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit “rich-get-richer” characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering, the uniform process, for applications where the “rich-get-richer” property is undesirable. We also explore the cost of this new process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. Finally, we compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.' volume: 9 URL: https://proceedings.mlr.press/v9/wallach10a.html PDF: http://proceedings.mlr.press/v9/wallach10a/wallach10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-wallach10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Hanna family: Wallach - given: Shane family: Jensen - given: Lee family: Dicker - given: Katherine family: Heller editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 892-899 id: wallach10a issued: date-parts: - 2010 - 3 - 31 firstpage: 892 lastpage: 899 published: 2010-03-31 00:00:00 +0000 - title: 'A Potential-based Framework for Online Multi-class Learning with Partial Feedback' abstract: 'We study the problem of online multi-class learning with partial feedback: in each trial of online learning, instead of providing the true class label for a given instance, the oracle will only reveal to the learner if the predicted class label is correct. We present a general framework for online multi-class learning with partial feedback that adapts the potential-based gradient descent approaches (Cesa-Bianchi & Lugosi, 2006). The generality of the proposed framework is verified by the fact that Banditron (Kakade et al., 2008) is indeed a special case of our work if the potential function is set to be the squared $L_2$ norm of the weight vector. We propose an exponential gradient algorithm for online multi-class learning with partial feedback. Compared to the Banditron algorithm, the exponential gradient algorithm is advantageous in that its mistake bound is independent from the dimension of data, making it suitable for classifying high dimensional data. Our empirical study with four data sets show that the proposed algorithm for online learning with partial feedback is more effective than the Banditron algorithm.' volume: 9 URL: https://proceedings.mlr.press/v9/wang10a.html PDF: http://proceedings.mlr.press/v9/wang10a/wang10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-wang10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Shijun family: Wang - given: Rong family: Jin - given: Hamed family: Valizadegan editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 900-907 id: wang10a issued: date-parts: - 2010 - 3 - 31 firstpage: 900 lastpage: 907 published: 2010-03-31 00:00:00 +0000 - title: 'Online Passive-Aggressive Algorithms on a Budget' abstract: 'In this paper a kernel-based online learning algorithm, which has both constant space and update time, is proposed. The approach is based on the popular online Passive-Aggressive (PA) algorithm. When used in conjunction with kernel function, the number of support vectors in PA grows without bounds when learning from noisy data streams. This implies unlimited memory and ever increasing model update and prediction time. To address this issue, the proposed budgeted PA algorithm maintains only a fixed number of support vectors. By introducing an additional constraint to the original PA optimization problem, a closed-form solution was derived for the support vector removal and model update. Using the hinge loss we developed several budgeted PA algorithms that can trade between accuracy and update cost. We also developed the ramp loss versions of both original and budgeted PA and showed that the resulting algorithms can be interpreted as the combination of active learning and hinge loss PA. All proposed algorithms were comprehensively tested on 7 benchmark data sets. The experiments showed that they are superior to the existing budgeted online algorithms. Even with modest budgets, the budgeted PA achieved very competitive accuracies to the non-budgeted PA and kernel perceptron algorithms.' volume: 9 URL: https://proceedings.mlr.press/v9/wang10b.html PDF: http://proceedings.mlr.press/v9/wang10b/wang10b.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-wang10b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Zhuang family: Wang - given: Slobodan family: Vucetic editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 908-915 id: wang10b issued: date-parts: - 2010 - 3 - 31 firstpage: 908 lastpage: 915 published: 2010-03-31 00:00:00 +0000 - title: 'Structured Prediction Cascades' abstract: 'Structured prediction tasks pose a fundamental trade-off between the need for model complexity to increase predictive power and the limited computational resources for inference in the exponentially-sized output spaces such models require. We formulate and develop structured prediction cascades: a sequence of increasingly complex models that progressively filter the space of possible outputs. We represent an exponentially large set of filtered outputs using max marginals and propose a novel convex loss function that balances filtering error with filtering efficiency. We provide generalization bounds for these loss functions and evaluate our approach on handwriting recognition and part-of-speech tagging. We find that the learned cascades are capable of reducing the complexity of inference by up to five orders of magnitude, enabling the use of models which incorporate higher order features and yield higher accuracy.' volume: 9 URL: https://proceedings.mlr.press/v9/weiss10a.html PDF: http://proceedings.mlr.press/v9/weiss10a/weiss10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-weiss10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: David family: Weiss - given: Benjamin family: Taskar editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 916-923 id: weiss10a issued: date-parts: - 2010 - 3 - 31 firstpage: 916 lastpage: 923 published: 2010-03-31 00:00:00 +0000 - title: 'Dependent Indian Buffet Processes' abstract: 'Latent variable models represent hidden structure in observational data.To account for the distribution of the observational data changing over time, space or some other covariate, we need generalizations of latent variable models that explicitly capture this dependency on the covariate. A variety of such generalizations has been proposed for latent variable models based on the Dirichlet process. We address dependency on covariates in binary latent feature models, by introducing a dependent Indian buffet process. The model generates, for each value of the covariate, a binary random matrix with an unbounded number of columns. Evolution of the binary matrices over the covariate set is controlled by a hierarchical Gaussian process model. The choice of covariance functions controls the dependence structure and exchangeability properties of the model. We derive a Markov Chain Monte Carlo sampling algorithm for Bayesian inference, and provide experiments on both synthetic and real-world data. The experimental results show that explicit modeling of dependencies significantly improves accuracy of predictions.' volume: 9 URL: https://proceedings.mlr.press/v9/williamson10a.html PDF: http://proceedings.mlr.press/v9/williamson10a/williamson10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-williamson10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Sinead family: Williamson - given: Peter family: Orbanz - given: Zoubin family: Ghahramani editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 924-931 id: williamson10a issued: date-parts: - 2010 - 3 - 31 firstpage: 924 lastpage: 931 published: 2010-03-31 00:00:00 +0000 - title: 'Modeling annotator expertise: Learning when everybody knows a bit of something' abstract: 'Supervised learning from multiple labeling sources is an increasingly important problem in machine learning and data mining. This paper develops a probabilistic approach to this problem when annotators may be unreliable (labels are noisy), but also their expertise varies depending on the data they observe (annotators may have knowledge about different parts of the input space). That is, an annotator may not be consistently accurate (or inaccurate) across the task domain. The presented approach produces classification and annotator models that allow us to provide estimates of the true labels and annotator variable expertise. We provide an analysis of the proposed model under various scenarios and show experimentally that annotator expertise can indeed vary in real tasks and that the presented approach provides clear advantages over previously introduced multi-annotator methods, which only consider general annotator characteristics.' volume: 9 URL: https://proceedings.mlr.press/v9/yan10a.html PDF: http://proceedings.mlr.press/v9/yan10a/yan10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-yan10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Yan family: Yan - given: Romer family: Rosales - given: Glenn family: Fung - given: Mark family: Schmidt - given: Gerardo family: Hermosillo - given: Luca family: Bogoni - given: Linda family: Moy - given: Jennifer family: Dy editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 932-939 id: yan10a issued: date-parts: - 2010 - 3 - 31 firstpage: 932 lastpage: 939 published: 2010-03-31 00:00:00 +0000 - title: 'A highly efficient blocked Gibbs sampler reconstruction of multidimensional NMR spectra' abstract: 'Projection Reconstruction Nuclear Magnetic Resonance (PR-NMR) is a new technique to generate multi-dimensional NMR spectra, which have discrete features that are relatively sparsely distributed in space. A small number of projections from lower dimensional NMR spectra are used to reconstruct the multi-dimensional NMR spectra. We propose an efficient algorithm which employs a blocked Gibbs sampler to accurately reconstruct NMR spectra. This statistical method generates samples in Bayesian scheme. Our proposed algorithm is tested on a set of six projections derived from the three-dimensional 700 MHz HNCO spectrum of HasA, a 187-residue heme binding protein.' volume: 9 URL: https://proceedings.mlr.press/v9/yoon10a.html PDF: http://proceedings.mlr.press/v9/yoon10a/yoon10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-yoon10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Ji Won family: Yoon - given: Simon family: Wilson - given: K. Hun family: Mok editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 940-947 id: yoon10a issued: date-parts: - 2010 - 3 - 31 firstpage: 940 lastpage: 947 published: 2010-03-31 00:00:00 +0000 - title: 'Risk Bounds for Levy Processes in the PAC-Learning Framework' abstract: 'Levy processes play an important role in the stochastic process theory. However, since samples are non-i.i.d., statistical learning results based on the i.i.d. scenarios cannot be utilized to study the risk bounds for Levy processes. In this paper, we present risk bounds for non-i.i.d. samples drawn from Levy processes in the PAC-learning framework. In particular, by using a concentration inequality for infinitely divisible distributions, we first prove that the function of risk error is Lipschitz continuous with a high probability, and then by using a specific concentration inequality for Levy processes, we obtain the risk bounds for non-i.i.d. samples drawn from Levy processes without Gaussian components. Based on the resulted risk bounds, we analyze the factors that affect the convergence of the risk bounds and then prove the convergence.' volume: 9 URL: https://proceedings.mlr.press/v9/zhang10a.html PDF: http://proceedings.mlr.press/v9/zhang10a/zhang10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Chao family: Zhang - given: Dacheng family: Tao editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 948-955 id: zhang10a issued: date-parts: - 2010 - 3 - 31 firstpage: 948 lastpage: 955 published: 2010-03-31 00:00:00 +0000 - title: 'Bayesian Online Learning for Multi-label and Multi-variate Performance Measures' abstract: 'Many real world applications employ multi-variate performance measures and each example can belong to multiple classes. The currently most popular approaches train an SVM for each class, followed by ad hoc thresholding. Probabilistic models using Bayesian decision theory are also commonly adopted. In this paper, we propose a Bayesian online multi-label classification framework (BOMC) which learns a probabilistic linear classifier. The likelihood is modeled by a graphical model similar to TrueSkill$^\text{TM}$, and inference is based on Gaussian density filtering with expectation propagation. Using samples from the posterior, we label the testing data by maximizing the expected $F_1$-score. Our experiments on Reuters1-v2 dataset show BOMC compares favorably to the state-of-the-art online learners in macro-averaged $F_1$-score and training time.' volume: 9 URL: https://proceedings.mlr.press/v9/zhang10b.html PDF: http://proceedings.mlr.press/v9/zhang10b/zhang10b.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Xinhua family: Zhang - given: Thore family: Graepel - given: Ralf family: Herbrich editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 956-963 id: zhang10b issued: date-parts: - 2010 - 3 - 31 firstpage: 956 lastpage: 963 published: 2010-03-31 00:00:00 +0000 - title: 'Multi-Task Learning using Generalized t Process' abstract: 'Multi-task learning seeks to improve the generalization performance of a learning task with the help of other related learning tasks. Among the multi-task learning methods proposed thus far, Bonilla et al.’s method provides a novel multi-task extension of Gaussian process (GP) by using a task covariance matrix to model the relationships between tasks. However, learning the task covariance matrix directly has both computational and representational drawbacks. In this paper, we propose a Bayesian extension by modeling the task covariance matrix as a random matrix with an inverse-Wishart prior and integrating it out to achieve Bayesian model averaging. To make the computation feasible, we first give an alternative weight-space view of Bonilla et al.’s multi-task GP model and then integrate out the task covariance matrix in the model, leading to a multi-task generalized t process (MTGTP). For the likelihood, we use a generalized t noise model which, together with the generalized t process prior, brings about the robustness advantage as well as an analytical form for the marginal likelihood. In order to specify the inverse-Wishart prior, we use the maximum mean discrepancy (MMD) statistic to estimate the parameter matrix of the inverse-Wishart prior. Moreover, we investigate some theoretical properties of MTGTP, such as its asymptotic analysis and learning curve. Comparative experimental studies on two common multi-task learning applications show very promising results.' volume: 9 URL: https://proceedings.mlr.press/v9/zhang10c.html PDF: http://proceedings.mlr.press/v9/zhang10c/zhang10c.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Yu family: Zhang - given: Dit–Yan family: Yeung editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 964-971 id: zhang10c issued: date-parts: - 2010 - 3 - 31 firstpage: 964 lastpage: 971 published: 2010-03-31 00:00:00 +0000 - title: 'Bayesian Generalized Kernel Models' abstract: 'We propose a fully Bayesian approach for generalized kernel models (GKMs), which are extensions of generalized linear models in the feature space induced by a reproducing kernel. We place a mixture of a point-mass distribution and Silverman’s g-prior on the regression vector of GKMs. This mixture prior allows a fraction of the regression vector to be zero. Thus, it serves for sparse modeling and Bayesian computation. For inference, we exploit data augmentation methodology to develop a Markov chain Monte Carlo (MCMC) algorithm in which the reversible jump method is used for model selection and a Bayesian model averaging method is used for posterior prediction.' volume: 9 URL: https://proceedings.mlr.press/v9/zhang10d.html PDF: http://proceedings.mlr.press/v9/zhang10d/zhang10d.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Zhihua family: Zhang - given: Guang family: Dai - given: Donghui family: Wang - given: Michael I. family: Jordan editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 972-979 id: zhang10d issued: date-parts: - 2010 - 3 - 31 firstpage: 972 lastpage: 979 published: 2010-03-31 00:00:00 +0000 - title: 'Matrix-Variate Dirichlet Process Mixture Models' abstract: 'We are concerned with a multivariate response regression problem where the interest is in considering correlations both across response variates and across response samples. In this paper we develop a new Bayesian nonparametric model for such a setting based on Dirichlet process priors. Building on an additive kernel model, we allow each sample to have its own regression matrix. Although this overcomplete representation could in principle suffer from severe overfitting problems, we are able to provide effective control over the model via a matrix-variate Dirichlet process prior on the regression matrices. Our model is able to share statistical strength among regression matrices due to the clustering property of the Dirichlet process. We make use of a Markov chain Monte Carlo algorithm for inference and prediction. Compared with other Bayesian kernel models, our model has advantages in both computational and statistical efficiency.' volume: 9 URL: https://proceedings.mlr.press/v9/zhang10e.html PDF: http://proceedings.mlr.press/v9/zhang10e/zhang10e.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Zhihua family: Zhang - given: Guang family: Dai - given: Michael I. family: Jordan editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 980-987 id: zhang10e issued: date-parts: - 2010 - 3 - 31 firstpage: 980 lastpage: 987 published: 2010-03-31 00:00:00 +0000 - title: 'Exclusive Lasso for Multi-task Feature Selection' abstract: 'We propose a novel group regularization which we call exclusive lasso. Unlike the group lasso regularizer that assumes co-varying variables in groups, the proposed exclusive lasso regularizer models the scenario when variables in the same group compete with each other. Analysis is presented to illustrate the properties of the proposed regularizer. We present a framework of kernel-based multi-task feature selection algorithm based on the proposed exclusive lasso regularizer. An efficient algorithm is derived to solve the related optimization problem. Experiments with document categorization show that our approach outperforms state-of-the-art algorithms for multi-task feature selection.' volume: 9 URL: https://proceedings.mlr.press/v9/zhou10a.html PDF: http://proceedings.mlr.press/v9/zhou10a/zhou10a.pdf edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhou10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics' publisher: 'PMLR' author: - given: Yang family: Zhou - given: Rong family: Jin - given: Steven Chu–Hong family: Hoi editor: - given: Yee Whye family: Teh - given: Mike family: Titterington address: Chia Laguna Resort, Sardinia, Italy page: 988-995 id: zhou10a issued: date-parts: - 2010 - 3 - 31 firstpage: 988 lastpage: 995 published: 2010-03-31 00:00:00 +0000