- title: 'Preface'
  abstract: 'Preface to the Proceedings of the Thirteenth International  Conference on Artificial Intelligence and Statistics May 13-15, 2010,  Chia Laguna Resort, Sardinia, Italy.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/teh10a.html
  PDF: http://proceedings.mlr.press/v9/teh10a/teh10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-teh10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: i-v
  id: teh10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: i
  lastpage: v
  published: 2010-03-31 00:00:00 +0000
- title: 'Learning the Structure of Deep Sparse Graphical Models'
  abstract: 'Deep belief networks are a powerful way to model complex probability   distributions.  However, it is difficult to learn the structure of a   belief network, particularly one with hidden units.  The Indian   buffet process has been used as a nonparametric Bayesian prior on   the structure of a directed belief network with a single infinitely   wide hidden layer. Here, we introduce the cascading Indian   buffet process (CIBP), which provides a prior on the structure of a   layered, directed belief network that is unbounded in both depth and   width, yet allows tractable inference.  We use the CIBP prior with   the nonlinear Gaussian belief network framework to allow each unit   to vary its behavior between discrete and continuous   representations.  We use Markov chain Monte Carlo for inference in   this model and explore the structures learned on image data.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/adams10a.html
  PDF: http://proceedings.mlr.press/v9/adams10a/adams10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-adams10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ryan P.
    family: Adams
  - given: Hanna
    family: Wallach
  - given: Zoubin
    family: Ghahramani
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 1-8
  id: adams10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 1
  lastpage: 8
  published: 2010-03-31 00:00:00 +0000
- title: 'Optimal Allocation Strategies for the Dark Pool Problem'
  abstract: 'We study the problem of allocating stocks to dark pools. We propose and analyze an optimal approach for allocations, if continuous-valued allocations are allowed. We also propose a modification for the case when only integer-valued allocations are possible. We extend the previous work on this problem (Ganchev et al., 2009) to adversarial scenarios, while also improving over their results in the iid setup. The resulting algorithms are efficient, and perform well in simulations under stochastic and adversarial inputs.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/agarwal10a.html
  PDF: http://proceedings.mlr.press/v9/agarwal10a/agarwal10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-agarwal10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alekh
    family: Agarwal
  - given: Peter
    family: Bartlett
  - given: Max
    family: Dama
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 9-16
  id: agarwal10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 9
  lastpage: 16
  published: 2010-03-31 00:00:00 +0000
- title: 'Multitask Learning for Brain-Computer Interfaces'
  abstract: 'Brain-computer interfaces (BCIs) are limited in their applicability in everyday settings by the current necessity to record subject-specific calibration data prior to actual use of the BCI for communication. In this paper, we utilize the framework of multitask learning to construct a BCI that can be used without any subject-specific calibration process. We discuss how this out-of-the-box BCI can be further improved in a computationally efficient manner as subject-specific data becomes available. The feasibility of the approach is demonstrated on two sets of experimental EEG data recorded during a standard two-class motor imagery paradigm from a total of 19 healthy subjects. Specifically, we show that satisfactory classification results can be achieved with zero training data, and combining prior recordings with subject-specific calibration data substantially outperforms using subject-specific data only. Our results further show that transfer between recordings under slightly different experimental setups is feasible.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/alamgir10a.html
  PDF: http://proceedings.mlr.press/v9/alamgir10a/alamgir10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-alamgir10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Morteza
    family: Alamgir
  - given: Moritz
    family: Grosse–Wentrup
  - given: Yasemin
    family: Altun
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 17-24
  id: alamgir10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 17
  lastpage: 24
  published: 2010-03-31 00:00:00 +0000
- title: 'Efficient Multioutput Gaussian Processes through Variational Inducing Kernels'
  abstract: 'Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way to construct such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Alvarez and Lawrence recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/alvarez10a.html
  PDF: http://proceedings.mlr.press/v9/alvarez10a/alvarez10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-alvarez10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mauricio
    family: Álvarez
  - given: David
    family: Luengo
  - given: Michalis
    family: Titsias
  - given: Neil D.
    family: Lawrence
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 25-32
  id: alvarez10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 25
  lastpage: 32
  published: 2010-03-31 00:00:00 +0000
- title: 'Learning with Blocks: Composite Likelihood and Contrastive Divergence'
  abstract: 'Composite likelihood methods provide a wide spectrum of computationally efficient techniques for statistical tasks such as parameter estimation and model selection. In this paper, we present a formal connection between the optimization of composite likelihoods and the well-known contrastive divergence algorithm. In particular, we show that composite likelihoods can be stochastically optimized by performing a variant of contrastive divergence with random-scan blocked Gibbs sampling. By using higher-order composite likelihoods, our proposed learning framework makes it possible to trade off computation time for increased accuracy. Furthermore, one can choose composite likelihood blocks that match the model’s dependence structure, making the optimization of higher-order composite likelihoods computationally efficient.  We empirically analyze the performance of blocked contrastive divergence on various models, including visible Boltzmann machines, conditional random fields, and exponential random graph models, and we demonstrate that using higher-order blocks improves both the accuracy of parameter estimates and the rate of convergence.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/asuncion10a.html
  PDF: http://proceedings.mlr.press/v9/asuncion10a/asuncion10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-asuncion10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arthur
    family: Asuncion
  - given: Qiang
    family: Liu
  - given: Alexander
    family: Ihler
  - given: Padhraic
    family: Smyth
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 33-40
  id: asuncion10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 33
  lastpage: 40
  published: 2010-03-31 00:00:00 +0000
- title: 'Deterministic Bayesian inference for the $p*$ model'
  abstract: 'The $p*$ model is widely used in social network analysis. The likelihood of a network under this model is impossible to calculate for all but trivially small networks. Various approximation have been presented in  the literature, and the pseudolikelihood approximation is the most popular. The aim of this paper is to introduce two likelihood approximations which have the pseudolikelihood estimator as a special case. We show, for the examples that we have considered, that both approximations result in improved estimation of model parameters with respect to the standard methodological approaches. We provide a deterministic  approach and also illustrate how Bayesian model choice can be carried out in this setting.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/austad10a.html
  PDF: http://proceedings.mlr.press/v9/austad10a/austad10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-austad10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Haakon
    family: Austad
  - given: Nial
    family: Friel
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 41-48
  id: austad10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 41
  lastpage: 48
  published: 2010-03-31 00:00:00 +0000
- title: 'Half Transductive Ranking'
  abstract: 'We study the standard retrieval task of ranking a fixed set of items given a previously unseen query and pose it as the half transductive ranking problem. The task is transductive as the set of items is fixed. Transductive representations (where the vector representation of each example is learned) allow the generation of highly nonlinear embeddings that capture object relationships without relying on a specific choice of features, and require only relatively simple optimization. Unfortunately, they have no direct out-of-sample extension. Inductive approaches on the other hand allow for the representation of unknown queries. We describe algorithms for this setting which have the advantages of both transductive and inductive approaches, and can be applied in unsupervised (either reconstruction-based or graph-based) and supervised ranking setups. We show empirically that our methods give strong performance on all three tasks.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/bai10a.html
  PDF: http://proceedings.mlr.press/v9/bai10a/bai10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-bai10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Bing
    family: Bai
  - given: Jason
    family: Weston
  - given: David
    family: Grangier
  - given: Ronan
    family: Collobert
  - given: Corinna
    family: Cortes
  - given: Mehryar
    family: Mohri
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 49-56
  id: bai10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 49
  lastpage: 56
  published: 2010-03-31 00:00:00 +0000
- title: 'Kernel Partial Least Squares is Universally Consistent'
  abstract: 'We prove the statistical consistency of kernel Partial Least Squares Regression applied to a bounded regression learning problem on a reproducing kernel Hilbert space. Partial Least Squares  stands out of well-known classical approaches as e.g. Ridge Regression or Principal Components Regression, as it is not defined as the solution of a global cost minimization procedure over a fixed model nor is it a linear estimator. Instead, approximate solutions are constructed by projections onto a nested set of data-dependent subspaces. To prove consistency, we exploit the known fact that Partial Least Squares is equivalent to the conjugate gradient algorithm in combination with early stopping. The choice of the stopping rule (number of iterations) is a crucial point. We study two empirical stopping rules. The first one monitors the estimation error in each iteration step of Partial Least Squares, and the second one estimates the empirical complexity in terms of a condition number. Both stopping rules lead to universally consistent estimators provided the kernel is universal.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/blanchard10a.html
  PDF: http://proceedings.mlr.press/v9/blanchard10a/blanchard10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-blanchard10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Gilles
    family: Blanchard
  - given: Nicole
    family: Krämer
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 57-64
  id: blanchard10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 57
  lastpage: 64
  published: 2010-03-31 00:00:00 +0000
- title: 'Towards Understanding Situated Natural Language'
  abstract: 'We present a general framework and learning algorithm for the task of concept labeling: each word in a given sentence has to be tagged with the unique physical entity (e.g. person, object or location) or abstract concept it refers to. Our method allows both world knowledge and linguistic information to be used during learning and prediction. We show experimentally that we can learn to use world knowledge to resolve ambiguities in language, such as word senses or reference resolution, without the use of handcrafted rules or features.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/bordes10a.html
  PDF: http://proceedings.mlr.press/v9/bordes10a/bordes10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-bordes10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Antoine
    family: Bordes
  - given: Nicolas
    family: Usunier
  - given: Ronan
    family: Collobert
  - given: Jason
    family: Weston
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 65-72
  id: bordes10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 65
  lastpage: 72
  published: 2010-03-31 00:00:00 +0000
- title: 'Using Descendants as Instrumental Variables for the Identification of Direct Causal Effects in Linear SEMs'
  abstract: 'In this paper, we present an extended set of graphical criteria for the identification of direct causal effects in linear Structural Equation Models (SEMs). Previous methods of graphical identification of direct causal effects in linear SEMs include methods such as the single-door criterion, the instrumental variable and the IV-pair, and the accessory set. However, there remain graphical models where a direct causal effect can be identified and these graphical criteria all fail. As a result, we introduce a new set of graphical criteria which uses descendants of either the cause variable or the effect variable as “path-specific instrumental variables” for the identification of the direct causal effect as long as certain conditions are satisfied. These conditions are based on edge removal and the existing graphical criteria of instrumental variables, and the identifiability of certain other total effects, and thus can be easily checked.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/chan10a.html
  PDF: http://proceedings.mlr.press/v9/chan10a/chan10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-chan10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hei
    family: Chan
  - given: Manabu
    family: Kuroki
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 73-80
  id: chan10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 73
  lastpage: 80
  published: 2010-03-31 00:00:00 +0000
- title: 'Why are DBNs sparse?'
  abstract: 'Real stochastic processes operate in continuous time and can be modeled by sets of stochastic differential equations.  On the other hand, several popular model families, including hidden Markov models and dynamic Bayesian networks (DBNs), use discrete time steps.  This paper explores methods for converting DBNs with infinitesimal time steps into DBNs with finite time steps, to enable efficient simulation and filtering over long periods.  An exact conversion—summing out all intervening time slices between two steps—results in a completely connected DBN, yet nearly all human-constructed DBNs are sparse.  We show how this sparsity arises from well-founded approximations resulting from differences among the natural time scales of the variables in the DBN. We define an automated procedure for constructing a provably accurate, approximate DBN model for any desired time step. We illustrate the method by generating a series of approximations to a simple pH model for the human body, demonstrating speedups of several orders of magnitude compared to the original model.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/chatterjee10a.html
  PDF: http://proceedings.mlr.press/v9/chatterjee10a/chatterjee10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-chatterjee10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shaunak
    family: Chatterjee
  - given: Stuart
    family: Russell
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 81-88
  id: chatterjee10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 81
  lastpage: 88
  published: 2010-03-31 00:00:00 +0000
- title: 'Focused Belief Propagation for Query-Specific Inference'
  abstract: 'With the increasing popularity of large-scale probabilistic graphical models, even “lightweight” approximate inference methods are becoming infeasible. Fortunately, often large parts of the model are of no immediate interest to the end user. Given the variable that the user actually cares about, we show how to quantify edge importance in graphical models and to significantly speed up inference by focusing computation on important parts of the model. Our algorithm empirically demonstrates convergence speedup by multiple times over state of the art'
  volume: 9
  URL: https://proceedings.mlr.press/v9/chechetka10a.html
  PDF: http://proceedings.mlr.press/v9/chechetka10a/chechetka10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-chechetka10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Anton
    family: Chechetka
  - given: Carlos
    family: Guestrin
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 89-96
  id: chechetka10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 89
  lastpage: 96
  published: 2010-03-31 00:00:00 +0000
- title: 'Parametric Herding'
  abstract: 'A parametric version of herding is formulated. The nonlinear mapping between consecutive time slices is learned by a form of self-supervised training. The resulting dynamical system generates pseudo-samples that resemble the original data. We show how this parametric herding can be successfully used to compress a dataset consisting of binary digits. It is also verified that high compression rates translate into good prediction performance on unseen test data.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/chen10a.html
  PDF: http://proceedings.mlr.press/v9/chen10a/chen10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-chen10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yutian
    family: Chen
  - given: Max
    family: Welling
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 97-104
  id: chen10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 97
  lastpage: 104
  published: 2010-03-31 00:00:00 +0000
- title: 'Mass Fatality Incident Identification based on nuclear DNA evidence'
  abstract: 'This paper focuses on the use of nuclear DNA Short Tandem Repeat traits for the identification of the victims of a Mass Fatality Incident. The goal of the analysis is  the assessment of the identification probabilities concerning the  recovered victims.  Identification hypotheses are evaluated conditionally to the DNA evidence observed both on the recovered victims and on the relatives of the missing persons disappeared in the tragical event. After specifying  a set of conditional independence assertions suitable for the problem, an inference strategy is provided, treating some points to achieve computational efficiency.  Finally, the proposal is tested through the simulation of a Mass Fatality Incident and the results are  examined in details.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/corradi10a.html
  PDF: http://proceedings.mlr.press/v9/corradi10a/corradi10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-corradi10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Fabio
    family: Corradi
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 105-112
  id: corradi10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 105
  lastpage: 112
  published: 2010-03-31 00:00:00 +0000
- title: 'On the Impact of Kernel Approximation on Learning Accuracy'
  abstract: 'Kernel approximation is commonly used to scale kernel-based algorithms to applications containing as many as several million instances. This paper analyzes the effect of such approximations in the kernel matrix on the hypothesis generated by several widely used learning algorithms. We give stability bounds based on the norm of the kernel approximation for these algorithms, including SVMs, KRR, and graph Laplacian-based regularization algorithms. These bounds help determine the degree of approximation that can be tolerated in the estimation of the kernel matrix. Our analysis is general and applies to arbitrary approximations of the kernel matrix. However, we also give a specific analysis of the Nystrom low-rank approximation in this context and report the results of experiments evaluating the quality of the Nystrom low-rank kernel approximation when used with ridge regression.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/cortes10a.html
  PDF: http://proceedings.mlr.press/v9/cortes10a/cortes10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-cortes10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Corinna
    family: Cortes
  - given: Mehryar
    family: Mohri
  - given: Ameet
    family: Talwalkar
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 113-120
  id: cortes10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 113
  lastpage: 120
  published: 2010-03-31 00:00:00 +0000
- title: 'Improving posterior marginal approximations in latent Gaussian models'
  abstract: 'We consider the problem of correcting the posterior marginal approximations computed by expectation propagation and Laplace approximation in latent Gaussian models and propose correction methods that are similar in spirit to the Laplace approximation of Tierney and Kadane (1986). We show that in the case of sparse Gaussian models, the computational complexity of expectation propagation can be made comparable to that of the Laplace approximation by using a parallel updating scheme. In some cases, expectation propagation gives excellent estimates, where the Laplace approximation fails. Inspired by bounds on the marginal corrections, we arrive at factorized approximations, which can be applied on top of both expectation propagation and Laplace. These give nearly indistinguishable results from the non-factorized approximations in a fraction of the time.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/cseke10a.html
  PDF: http://proceedings.mlr.press/v9/cseke10a/cseke10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-cseke10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Botond
    family: Cseke
  - given: Tom
    family: Heskes
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 121-128
  id: cseke10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 121
  lastpage: 128
  published: 2010-03-31 00:00:00 +0000
- title: 'Impossibility Theorems for Domain Adaptation'
  abstract: 'The domain adaptation problem in machine learning occurs when the test data generating distribution differs from the one that generates the training data. It is clear that the success of learning under such circumstances depends on similarities between the two data distributions. We study assumptions about the relationship between the two distributions that one needed for domain adaptation learning to succeed.  We analyze the assumptions in an agnostic PAC-style learning model for a the setting in which the learner can access a labeled training data sample and an unlabeled sample generated by the test data distribution. We focus on three assumptions: (i) Similarity between the unlabeled distributions, (ii) Existence of a classifier in the hypothesis class with low error on both training and testing distributions, and (iii) The covariate shift assumption. I.e., the assumption that the conditioned label distribution (for each data point) is the same for both the training and test distributions.  We show that without either assumption (i) or (ii), the combination of the remaining assumptions is not sufficient to guarantee successful learning. Our negative results hold with respect to any domain adaptation learning algorithm, as long as it does not have access to target labeled examples.  In particular, we provide formal proofs that the popular covariate shift assumption is rather weak and does not relieve the necessity of the other assumptions.  We also discuss the intuitively appealing paradigm of reweighing the labeled training sample according to the target unlabeled distribution. We show that, somewhat counter intuitively, that paradigm cannot be trusted in the following sense. There are DA tasks that are indistinguishable, as far as the input training data goes, but in which reweighing leads to significant improvement in one task, while causing dramatic deterioration of the learning success in the other.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/david10a.html
  PDF: http://proceedings.mlr.press/v9/david10a/david10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-david10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shai Ben
    family: David
  - given: Tyler
    family: Lu
  - given: Teresa
    family: Luu
  - given: David
    family: Pal
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 129-136
  id: david10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 129
  lastpage: 136
  published: 2010-03-31 00:00:00 +0000
- title: 'Multiclass-Multilabel Classification with More Classes than Examples'
  abstract: 'We discuss multiclass-multilabel classification problems in which   the set of possible labels is extremely large. Most existing   multiclass-multilabel learning algorithms expect to observe a   reasonably large sample from each class, and fail if they receive   only a handful of examples with a given label. We propose and   analyze the following two-stage approach: first use an arbitrary   (perhaps heuristic) classification algorithm to construct an initial   classifier, then apply a simple but principled method to augment   this classifier by removing harmful labels from its output.  A   careful theoretical analysis allows us to justify our approach under   some reasonable conditions (such as label sparsity and power-law   distribution of label frequencies), even when the training set does   not provide a statistically accurate representation of most   classes. Surprisingly, our theoretical analysis continues to hold   even when the number of classes exceeds the sample size. We   demonstrate the merits of our approach on the ambitious task of   categorizing the entire web using the 1.5 million categories   defined on Wikipedia.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/dekel10a.html
  PDF: http://proceedings.mlr.press/v9/dekel10a/dekel10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-dekel10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ofer
    family: Dekel
  - given: Ohad
    family: Shamir
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 137-144
  id: dekel10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 137
  lastpage: 144
  published: 2010-03-31 00:00:00 +0000
- title: 'Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines'
  abstract: 'Alternating Gibbs sampling is the most common scheme used for sampling from Restricted Boltzmann Machines (RBM), a crucial component in deep architectures such as Deep Belief Networks. However, we find that it often does a very poor job of rendering the diversity of modes captured by the trained model. We suspect that this hinders the advantage that could in principle be brought by training algorithms relying on Gibbs sampling for uncovering spurious modes, such as the Persistent Contrastive Divergence algorithm. To alleviate this problem, we explore the use of tempered Markov Chain Monte-Carlo for sampling in RBMs.  We find both through visualization of samples and measures of likelihood on a toy dataset that it helps both sampling and learning.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/desjardins10a.html
  PDF: http://proceedings.mlr.press/v9/desjardins10a/desjardins10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-desjardins10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Guillaume
    family: Desjardins
  - given: Aaron
    family: Courville
  - given: Yoshua
    family: Bengio
  - given: Pascal
    family: Vincent
  - given: Olivier
    family: Delalleau
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 145-152
  id: desjardins10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 145
  lastpage: 152
  published: 2010-03-31 00:00:00 +0000
- title: 'Feature Selection using Multiple Streams'
  abstract: 'Feature selection for supervised learning can be greatly improved by making use of the fact that features often come in classes. For example, in gene  expression data, the genes which serve as features may be  divided into classes based on their membership in gene families or pathways. When labeling words with senses for word sense disambiguation, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. We present a streamwise feature selection method that allows  dynamic generation and selection of features, while taking advantage of the different feature classes, and the fact that they are of different sizes and have different (but unknown) fractions of good features. Experimental results show that our approach provides significant improvement in performance and is computationally less expensive than comparable “batch” methods that do not take advantage of the feature classes and expect all features to be known in advance.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/dhillon10a.html
  PDF: http://proceedings.mlr.press/v9/dhillon10a/dhillon10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-dhillon10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Paramveer
    family: Dhillon
  - given: Dean
    family: Foster
  - given: Lyle
    family: Ungar
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 153-160
  id: dhillon10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 153
  lastpage: 160
  published: 2010-03-31 00:00:00 +0000
- title: 'Bayesian variable order Markov models'
  abstract: 'We present a simple, effective generalisation of variable order   Markov models to full online Bayesian estimation. The mechanism used   is close to that employed in context tree weighting. The main   contribution is the addition of a prior, conditioned on context, on   the Markov order. The resulting construction uses a simple recursion   and can be updated efficiently. This allows the model to make   predictions using more complex contexts, as more data is acquired,   if necessary.  In addition, our model can be alternatively seen as a   mixture of tree experts.  Experimental results show that the   predictive model exhibits consistently good performance in a variety   of domains.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/dimitrakakis10a.html
  PDF: http://proceedings.mlr.press/v9/dimitrakakis10a/dimitrakakis10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-dimitrakakis10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Christos
    family: Dimitrakakis
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 161-168
  id: dimitrakakis10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 161
  lastpage: 168
  published: 2010-03-31 00:00:00 +0000
- title: 'Nonparametric Bayesian Matrix Factorization by Power-EP'
  abstract: 'Many real-world applications can be modeled by matrix factorization. By approximating an observed data matrix  as the product of two latent matrices,  matrix factorization can reveal hidden structures embedded in data. A common challenge to use matrix factorization is determining the dimensionality of the latent matrices from data. Indian Buffet Processes (IBPs) enable us to apply the nonparametric Bayesian machinery to address this challenge. However, it remains a difficult task to learn nonparametric Bayesian matrix factorization models. In this paper, we propose a novel variational Bayesian method based on new equivalence classes of infinite matrices for learning these models. Furthermore, inspired by the success of nonnegative matrix factorization on many learning problems, we impose nonnegativity constraints on the latent matrices and mix variational inference with expectation propagation. This mixed inference method is unified in a power expectation propagation framework. Experimental results on image decomposition demonstrate the superior computational efficiency and the higher prediction accuracy of our methods compared to alternative Monte Carlo and variational inference methods for IBP models. We also apply the new methods to collaborative filtering and role mining and show the improved predictive performance over other matrix factorization methods.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/ding10a.html
  PDF: http://proceedings.mlr.press/v9/ding10a/ding10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ding10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nan
    family: Ding
  - given: Yuan
    family: Qi
  - given: Rongjing
    family: Xiang
  - given: Ian
    family: Molloy
  - given: Ninghui
    family: Li
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 169-176
  id: ding10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 169
  lastpage: 176
  published: 2010-03-31 00:00:00 +0000
- title: 'Neural conditional random fields'
  abstract: 'We propose a non-linear graphical model for structured prediction. It combines the power of deep neural networks to extract high level features with the graphical framework of Markov networks, yielding a powerful and scalable probabilistic model that we apply to signal labeling tasks.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/do10a.html
  PDF: http://proceedings.mlr.press/v9/do10a/do10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-do10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Trinh–Minh–Tri
    family: Do
  - given: Thierry
    family: Artieres
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 177-184
  id: do10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 177
  lastpage: 184
  published: 2010-03-31 00:00:00 +0000
- title: 'Combining Experiments to Discover Linear Cyclic Models with Latent Variables'
  abstract: 'We present an algorithm to infer causal relations between a set of measured variables on the basis of experiments on these variables. The algorithm assumes that the causal relations are linear, but is otherwise completely general: It provides consistent estimates when the true causal structure contains feedback loops and latent variables, while the experiments can involve surgical or ’soft’ interventions on one or multiple variables at a time. The algorithm is ’online’ in the sense that it combines the results from any set of available experiments, can incorporate background knowledge and resolves conflicts that arise from combining results from different experiments. In addition we provide a necessary and sufficient condition that (i) determines when the algorithm can uniquely return the true graph, and (ii) can be used to select the next best experiment until this condition is satisfied. We demonstrate the method by applying it to simulated data and the flow cytometry data of Sachs et al (2005).'
  volume: 9
  URL: https://proceedings.mlr.press/v9/eberhardt10a.html
  PDF: http://proceedings.mlr.press/v9/eberhardt10a/eberhardt10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-eberhardt10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Frederick
    family: Eberhardt
  - given: Patrik
    family: Hoyer
  - given: Richard
    family: Scheines
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 185-192
  id: eberhardt10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 185
  lastpage: 192
  published: 2010-03-31 00:00:00 +0000
- title: 'Graphical Gaussian modelling of multivariate time series with latent variables'
  abstract: 'In time series analysis, inference about cause-effect relationships among multiple times series is commonly based on the concept of Granger causality, which exploits temporal structure to achieve causal ordering of dependent variables. One major problem in the application of Granger causality for the identification of causal relationships is the possible presence of latent variables that affect the measured components and thus lead to so-called spurious causalities. In this paper, we describe a new graphical approach for modelling the dependence structure of multivariate stationary time series that are affected by latent variables. To this end, we introduce dynamic maximal ancestral graphs (dMAGs), in which each time series is represented by a single vertex. For Gaussian processes, this approach leads to vector autoregressive models with errors that are not independent but correlated according to the dashed edges in the graph. We discuss identifiability of the parameters and show that these models can be viewed as graphical ARMA models that satisfy the Granger causality restrictions encoded by the associated dynamic maximal ancestral graph.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/eichler10a.html
  PDF: http://proceedings.mlr.press/v9/eichler10a/eichler10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-eichler10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Michael
    family: Eichler
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 193-200
  id: eichler10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 193
  lastpage: 200
  published: 2010-03-31 00:00:00 +0000
- title: 'Why Does Unsupervised Pre-training Help Deep Learning?'
  abstract: 'Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder   variants with impressive results being obtained in several areas, mostly   on vision and language datasets.  The best results obtained on supervised   learning tasks often involve an unsupervised learning component, usually   in an unsupervised pre-training phase. The main question investigated   here is the following: why does unsupervised pre-training work so well?   Through extensive experimentation, we explore several possible   explanations discussed in the literature including its action as a   regularizer (Erhan et al. 2009) and as an aid to optimization   (Bengio et al. 2007).  Our results build on the work of   Erhan et al. 2009, showing that unsupervised pre-training appears to   play predominantly a regularization role in subsequent supervised   training. However our results in an online setting, with a virtually unlimited   data stream, point to a somewhat more nuanced interpretation of the roles   of optimization and regularization in the unsupervised pre-training   effect.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/erhan10a.html
  PDF: http://proceedings.mlr.press/v9/erhan10a/erhan10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-erhan10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dumitru
    family: Erhan
  - given: Aaron
    family: Courville
  - given: Yoshua
    family: Bengio
  - given: Pascal
    family: Vincent
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 201-208
  id: erhan10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 201
  lastpage: 208
  published: 2010-03-31 00:00:00 +0000
- title: 'Semi-Supervised Learning via Generalized Maximum Entropy'
  abstract: 'Various supervised inference methods can be analyzed as convex duals of the generalized maximum entropy (MaxEnt) framework. Generalized MaxEnt aims to find a distribution that maximizes an entropy function while respecting prior information represented as potential functions in miscellaneous forms of constraints and/or penalties. We extend this framework to semi-supervised learning by incorporating unlabeled data via modifications to these potential functions reflecting structural assumptions on the data geometry. The proposed approach leads to a family of discriminative semi-supervised algorithms, that are convex, scalable, inherently multi-class, easy to implement, and that can be kernelized naturally. Experimental evaluation of special cases shows the competitiveness of our methodology.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/erkan10a.html
  PDF: http://proceedings.mlr.press/v9/erkan10a/erkan10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-erkan10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ayse
    family: Erkan
  - given: Yasemin
    family: Altun
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 209-216
  id: erkan10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 209
  lastpage: 216
  published: 2010-03-31 00:00:00 +0000
- title: 'Model-Free Monte Carlo-like Policy Evaluation'
  abstract: 'We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the  sparsity of the sample of one-step transitions.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/fonteneau10a.html
  PDF: http://proceedings.mlr.press/v9/fonteneau10a/fonteneau10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-fonteneau10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Raphael
    family: Fonteneau
  - given: Susan
    family: Murphy
  - given: Louis
    family: Wehenkel
  - given: Damien
    family: Ernst
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 217-224
  id: fonteneau10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 217
  lastpage: 224
  published: 2010-03-31 00:00:00 +0000
- title: 'A Weighted Multi-Sequence Markov Model For Brain Lesion Segmentation'
  abstract: 'We propose a technique for fusing the output of multiple Magnetic Resonance (MR) sequences to robustly and accurately segment brain lesions. It is based on an augmented multi-sequence Hidden Markov model that includes additional weight variables to account for the relative importance and control the impact of each sequence. The augmented framework has the advantage of allowing 1) the incorporation of expert knowledge on the  a priori relevant information content of each sequence and 2) a weighting scheme which is modified adaptively according to the data and the segmentation task under consideration. The model, applied to the detection of multiple sclerosis and stroke lesions shows promising results.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/forbes10a.html
  PDF: http://proceedings.mlr.press/v9/forbes10a/forbes10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-forbes10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Florence
    family: Forbes
  - given: Senan
    family: Doyle
  - given: Daniel
    family: Garcia–Lorenzo
  - given: Christian
    family: Barillot
  - given: Michel
    family: Dojat
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 225-232
  id: forbes10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 225
  lastpage: 232
  published: 2010-03-31 00:00:00 +0000
- title: 'Posterior distributions are computable from predictive distributions'
  abstract: 'As we devise more complicated prior distributions, will inference algorithms keep up?  We highlight a negative result in computable probability theory by Ackerman, Freer, and Roy (2010) that shows that there exist computable priors with noncomputable posteriors.  In addition to providing a brief survey of computable probability theory geared towards the A.I. and statistics community, we give a new result characterizing when conditioning is computable in the setting of exchangeable sequences, and provide a computational perspective on work by Orbanz (2010) on conjugate nonparametric models.  In particular, using a computable extension of de Finetti’s theorem (Freer and Roy 2009), we describe how to transform a posterior predictive rule for generating an exchangeable sequence into an algorithm for computing the posterior distribution of the directing random measure.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/freer10a.html
  PDF: http://proceedings.mlr.press/v9/freer10a/freer10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-freer10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Cameron
    family: Freer
  - given: Daniel
    family: Roy
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 233-240
  id: freer10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 233
  lastpage: 240
  published: 2010-03-31 00:00:00 +0000
- title: 'Variational methods for Reinforcement Learning'
  abstract: 'We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate of the transition model does not reflect the uncertainty in the estimate of the transition model and the resulting policies may consequently lack a sufficient degree of exploration. We consider a Bayesian alternative that maintains a distribution over the transition so that the resulting policy takes into account the limited experience of the environment. The resulting algorithm is formally intractable and we discuss two approximate solution methods, Variational Bayes and Expectation Propagation.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/furmston10a.html
  PDF: http://proceedings.mlr.press/v9/furmston10a/furmston10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-furmston10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Thomas
    family: Furmston
  - given: David
    family: Barber
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 241-248
  id: furmston10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 241
  lastpage: 248
  published: 2010-03-31 00:00:00 +0000
- title: 'Understanding the difficulty of training deep feedforward neural networks'
  abstract: 'Whereas before 2006 it appears that deep multi-layer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.  We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1.  Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/glorot10a.html
  PDF: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-glorot10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xavier
    family: Glorot
  - given: Yoshua
    family: Bengio
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 249-256
  id: glorot10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 249
  lastpage: 256
  published: 2010-03-31 00:00:00 +0000
- title: 'On Combining Graph-based Variance Reduction schemes'
  abstract: 'In this paper, we consider two variance reduction schemes that exploit the structure of the primal graph of the graphical model: Rao-Blackwellised w-cutset sampling and AND/OR sampling. We show that the two schemes are orthogonal and can be combined to further reduce the variance. Our combination yields a new family of estimators which trade time and space with variance. We demonstrate experimentally that the new estimators are superior, often yielding an order of magnitude improvement over previous schemes on several benchmarks.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/gogate10a.html
  PDF: http://proceedings.mlr.press/v9/gogate10a/gogate10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-gogate10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vibhav
    family: Gogate
  - given: Rina
    family: Dechter
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 257-264
  id: gogate10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 257
  lastpage: 264
  published: 2010-03-31 00:00:00 +0000
- title: 'Locally Linear Denoising on Image Manifolds'
  abstract: 'We study the problem of image denoising where images are assumed to be samples from low dimensional  (sub)manifolds. We propose the algorithm of locally linear denoising. The algorithm approximates manifolds with locally linear patches by constructing nearest neighbor graphs. Each image is then locally denoised within its neighborhoods. A global optimal denoising result is then identified by aligning those local estimates. The algorithm has a closed-form solution that is efficient to compute. We evaluated and compared the algorithm to alternative methods on two image data sets. We demonstrated the effectiveness of the proposed algorithm, which yields visually appealing denoising results, incurs smaller reconstruction errors and results in lower error rates when the denoised data are used in supervised learning tasks.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/gong10a.html
  PDF: http://proceedings.mlr.press/v9/gong10a/gong10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-gong10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dian
    family: Gong
  - given: Fei
    family: Sha
  - given: Gérard
    family: Medioni
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 265-272
  id: gong10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 265
  lastpage: 272
  published: 2010-03-31 00:00:00 +0000
- title: 'Regret Bounds for Gaussian Process Bandit Problems'
  abstract: 'Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/grunewalder10a.html
  PDF: http://proceedings.mlr.press/v9/grunewalder10a/grunewalder10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-grunewalder10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Steffen
    family: Grünewälder
  - given: Jean–Yves
    family: Audibert
  - given: Manfred
    family: Opper
  - given: John
    family: Shawe–Taylor
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 273-280
  id: grunewalder10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 273
  lastpage: 280
  published: 2010-03-31 00:00:00 +0000
- title: 'Sufficient covariates and linear propensity analysis'
  abstract: 'Working within the decision-theoretic framework for causal inference, we study the properties of “sufficient covariates", which support causal inference from observational data, and possibilities for their reduction. In particular we illustrate the role of a propensity variable by means of a simple model, and explain why such a reduction typically does not increase (and may reduce) estimation efficiency.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/guo10a.html
  PDF: http://proceedings.mlr.press/v9/guo10a/guo10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-guo10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hui
    family: Guo
  - given: Philip
    family: Dawid
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 281-288
  id: guo10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 281
  lastpage: 288
  published: 2010-03-31 00:00:00 +0000
- title: 'Real-time Multiattribute Bayesian Preference Elicitation with Pairwise Comparison Queries'
  abstract: 'Preference elicitation (PE) is an important component of interactive decision support systems that aim to make optimal recommendations to users by actively querying their preferences.  In this paper, we outline five principles important for PE in real-world problems: (1) real-time, (2) multiattribute, (3) low cognitive load, (4) robust to noise, and (5) scalable.  In light of these requirements, we introduce an approximate PE framework based on TrueSkill for performing efficient closed-form Bayesian updates and query selection for a multiattribute utility belief state — a novel PE approach that naturally facilitates the efficient evaluation of value of information (VOI) heuristics for use in query selection strategies.  Our best VOI query strategy satisfies all five principles (in contrast to related work) and performs on par with the most accurate (and often computationally intensive) algorithms on experiments with synthetic and real-world datasets.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/guo10b.html
  PDF: http://proceedings.mlr.press/v9/guo10b/guo10b.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-guo10b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shengbo
    family: Guo
  - given: Scott
    family: Sanner
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 289-296
  id: guo10b
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 289
  lastpage: 296
  published: 2010-03-31 00:00:00 +0000
- title: 'Noise-contrastive estimation: A new estimation principle for unnormalized statistical models'
  abstract: 'We present a new estimation principle for parameterized statistical models. The idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise, using the model log-density function in the regression nonlinearity.  We show that this leads to a consistent (convergent) estimator of the parameters, and analyze the asymptotic variance.  In particular, the method is shown to directly work for unnormalized models, i.e. models where the density function does not integrate to one. The normalization constant can be estimated just like any other parameter. For a tractable ICA model, we compare the method with other estimation methods that can be used to learn unnormalized models, including score matching, contrastive divergence, and maximum-likelihood where the normalization constant is estimated with importance sampling. Simulations show that noise-contrastive estimation offers the best trade-off between computational and statistical efficiency. The method is then applied to the modeling of natural images: We show that the method can successfully estimate a large-scale two-layer model and a Markov random field.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/gutmann10a.html
  PDF: http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-gutmann10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Michael
    family: Gutmann
  - given: Aapo
    family: Hyvärinen
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 297-304
  id: gutmann10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 297
  lastpage: 304
  published: 2010-03-31 00:00:00 +0000
- title: 'Boosted Optimization for Network Classification'
  abstract: 'In this paper we propose a new classification algorithm designed for application on complex networks motivated by algorithmic similarities between boosting learning and message passing.  We consider a network classifier as a logistic regression where the variables define the nodes and the interaction effects define the edges.  From this definition we represent the problem as a factor graph of local exponential loss functions.  Using the factor graph representation it is possible to interpret the network classifier as an ensemble of individual node classifiers.  We then combine ideas from boosted learning with network optimization algorithms to define two novel algorithms, Boosted Expectation Propagation (BEP) and Boosted Message Passing (BMP).  These algorithms optimize the global network classifier performance by locally weighting each node classifier by the error of the surrounding network structure.  We compare the performance of BEP and BMP to logistic regression as well state of the art penalized logistic regression models on simulated grid structured networks.  The results show that using local boosting to optimize the performance of a network classifier increases classification performance and is especially powerful in cases when the whole network structure must be considered for accurate classification.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/hancock10a.html
  PDF: http://proceedings.mlr.press/v9/hancock10a/hancock10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-hancock10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Timothy
    family: Hancock
  - given: Hiroshi
    family: Mamitsuka
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 305-312
  id: hancock10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 305
  lastpage: 312
  published: 2010-03-31 00:00:00 +0000
- title: 'Dirichlet Process Mixtures of Generalized Linear Models'
  abstract: 'We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLMs), a new method of nonparametric regression that accommodates continuous and categorical inputs, models a response variable locally by a generalized linear model.  We give conditions for the existence and asymptotic unbiasedness of the DP-GLM regression mean function estimate;  we then give a practical example for when those conditions hold. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression including regression trees and Gaussian processes.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/hannah10a.html
  PDF: http://proceedings.mlr.press/v9/hannah10a/hannah10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-hannah10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lauren
    family: Hannah
  - given: David
    family: Blei
  - given: Warren
    family: Powell
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 313-320
  id: hannah10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 313
  lastpage: 320
  published: 2010-03-31 00:00:00 +0000
- title: 'Negative Results for Active Learning with Convex Losses'
  abstract: 'We study the problem of active learning with convex loss functions.  We prove that even under bounded noise constraints, the minimax rates for proper active learning are often no better than passive learning.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/hanneke10a.html
  PDF: http://proceedings.mlr.press/v9/hanneke10a/hanneke10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-hanneke10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Steve
    family: Hanneke
  - given: Liu
    family: Yang
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 321-325
  id: hanneke10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 321
  lastpage: 325
  published: 2010-03-31 00:00:00 +0000
- title: 'Coherent Inference on Optimal Play in Game Trees'
  abstract: 'Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/hennig10a.html
  PDF: http://proceedings.mlr.press/v9/hennig10a/hennig10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-hennig10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Philipp
    family: Hennig
  - given: David
    family: Stern
  - given: Thore
    family: Graepel
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 326-333
  id: hennig10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 326
  lastpage: 333
  published: 2010-03-31 00:00:00 +0000
- title: 'Collaborative Filtering via Rating Concentration'
  abstract: 'While most popular collaborative filtering methods use low-rank matrix factorization and parametric density assumptions, this article proposes an approach based on distribution-free concentration inequalities. Using agnostic hierarchical sampling assumptions, functions of observed ratings are provably close to their expectations over query ratings, on average. A joint probability distribution over queries of interest is estimated using maximum entropy regularization. The distribution resides in a convex hull of allowable candidate distributions which satisfy concentration inequalities that stem from the sampling assumptions. The method accurately estimates rating distributions on synthetic and real data and is competitive with low rank and parametric methods which make more aggressive assumptions about the problem.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/huang10a.html
  PDF: http://proceedings.mlr.press/v9/huang10a/huang10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-huang10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Bert
    family: Huang
  - given: Tony
    family: Jebara
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 334-341
  id: huang10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 334
  lastpage: 341
  published: 2010-03-31 00:00:00 +0000
- title: 'Maximum-likelihood learning of cumulative distribution functions on graphs'
  abstract: 'For many applications, a probability model can be easily expressed as a cumulative distribution functions (CDF) as compared to the use of probability density or mass functions (PDF/PMFs).  Cumulative distribution networks (CDNs) have recently been proposed as a class of graphical models for CDFs. One advantage of CDF models is the simplicity of representing multivariate heavy-tailed distributions. Examples of fields that can benefit from the use of graphical models for CDFs include climatology and epidemiology, where data may follow extreme value statistics and exhibit spatial correlations so that dependencies between model variables must be accounted for. The problem of learning from data in such settings may nevertheless consist of optimizing the log-likelihood function with respect to model parameters where we are required to optimize a log-PDF/PMF and not a log-CDF.   We present a message-passing algorithm called the gradient-derivative-product (GDP) algorithm that allows us to learn the model in terms of the log-likelihood function whereby messages correspond to local gradients of the likelihood with respect to model parameters.  We will demonstrate the GDP algorithm on real-world rainfall and H1N1 mortality data and we will show that CDNs provide a natural choice of parameterizations for the heavy-tailed multivariate distributions that arise in these problems.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/huang10b.html
  PDF: http://proceedings.mlr.press/v9/huang10b/huang10b.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-huang10b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jim
    family: Huang
  - given: Nebojsa
    family: Jojic
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 342-349
  id: huang10b
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 342
  lastpage: 349
  published: 2010-03-31 00:00:00 +0000
- title: 'Learning Nonlinear Dynamic Models from Non-sequenced Data'
  abstract: 'Virtually all methods of learning dynamic systems from data start from the same basic assumption: the learning algorithm will be given a sequence, or trajectory, of data generated from the dynamic system.  We consider the case where the data is not sequenced.  The training data points come from the system’s operation but with no temporal ordering.  The data are simply drawn as individual disconnected points.  While making this assumption may seem absurd at first glance, many scientific modeling tasks have exactly this property.  Previous work proposed methods for learning linear, discrete time models under these assumptions by optimizing approximate likelihood functions.  In this paper, we extend those methods to nonlinear models using kernel methods.  We go on to propose a new approach to solving the problem that focuses on achieving temporal smoothness in the learned dynamics.  The result is a convex criterion that can be easily optimized and often outperforms the earlier methods.  We test these methods on several synthetic data sets including one generated from the Lorenz attractor.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/huang10c.html
  PDF: http://proceedings.mlr.press/v9/huang10c/huang10c.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-huang10c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tzu–Kuo
    family: Huang
  - given: Le
    family: Song
  - given: Jeff
    family: Schneider
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 350-357
  id: huang10c
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 350
  lastpage: 357
  published: 2010-03-31 00:00:00 +0000
- title: 'Learning Bayesian Network Structure using LP Relaxations'
  abstract: 'We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid acyclic structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal Bayesian network. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. Empirical results suggest that the method is competitive or faster than alternative exact methods based on dynamic programming.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/jaakkola10a.html
  PDF: http://proceedings.mlr.press/v9/jaakkola10a/jaakkola10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-jaakkola10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tommi
    family: Jaakkola
  - given: David
    family: Sontag
  - given: Amir
    family: Globerson
  - given: Marina
    family: Meila
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 358-365
  id: jaakkola10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 358
  lastpage: 365
  published: 2010-03-31 00:00:00 +0000
- title: 'Structured Sparse Principal Component Analysis'
  abstract: 'We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by Jenatton et al. (2009). While classical sparse priors only deal with cardinality, the regularization we use encodes higher-order information about the data. We propose an efficient and simple optimization procedure to solve this problem. Experiments with two practical tasks, the denoising of sparse structured signals and face recognition, demonstrate the benefits of the proposed structured approach over unstructured approaches.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/jenatton10a.html
  PDF: http://proceedings.mlr.press/v9/jenatton10a/jenatton10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-jenatton10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rodolphe
    family: Jenatton
  - given: Guillaume
    family: Obozinski
  - given: Francis
    family: Bach
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 366-373
  id: jenatton10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 366
  lastpage: 373
  published: 2010-03-31 00:00:00 +0000
- title: 'Nonlinear functional regression: a functional RKHS approach'
  abstract: 'This paper deals with functional regression, in which the input attributes as well as the response are functions. To deal with this problem, we develop a functional reproducing kernel Hilbert space approach; here, a kernel is an operator acting on a function and yielding a function. We demonstrate basic properties of these functional RKHS, as well as a representer theorem for this setting; we investigate the construction of kernels; we provide some experimental insight.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/kadri10a.html
  PDF: http://proceedings.mlr.press/v9/kadri10a/kadri10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kadri10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hachem
    family: Kadri
  - given: Emmanuel
    family: Duflos
  - given: Philippe
    family: Preux
  - given: Stéphane
    family: Canu
  - given: Manuel
    family: Davy
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 374-380
  id: kadri10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 374
  lastpage: 380
  published: 2010-03-31 00:00:00 +0000
- title: 'Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity'
  abstract: 'The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model. A central issue is learning these models in high-dimensions when the optimal parameter  vector is sparse. This work characterizes a certain strong convexity property of general exponential families, which allows their generalization ability to be quantified. In particular, we show how this property can be used to analyze generic exponential families under L1 regularization.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/kakade10a.html
  PDF: http://proceedings.mlr.press/v9/kakade10a/kakade10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kakade10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sham
    family: Kakade
  - given: Ohad
    family: Shamir
  - given: Karthik
    family: Sindharan
  - given: Ambuj
    family: Tewari
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 381-388
  id: kakade10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 381
  lastpage: 388
  published: 2010-03-31 00:00:00 +0000
- title: 'Collaborative Filtering on a Budget'
  abstract: 'Matrix factorization is a successful technique for building   collaborative filtering systems. While it works well on a large   range of problems, it is also known for requiring significant   amounts of storage for each user or item to be added to the   database.  This is a problem whenever the collaborative filtering   task is larger than the medium-sized Netflix Prize data.    In this paper, we propose a new model for representing and   compressing matrix factors via hashing.  This allows for   essentially unbounded storage (at a graceful storage / performance   trade-off) for users and items to be represented in a pre-defined   memory footprint.  It allows us to scale recommender systems to very   large numbers of users or conversely, obtain very good performance   even for tiny models (e.g. 400kB of data suffice for a   representation of the EachMovie problem).    We provide both experimental results and approximation bounds for   our compressed representation and we show how this approach can be   extended to multipartite problems.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/karatzoglou10a.html
  PDF: http://proceedings.mlr.press/v9/karatzoglou10a/karatzoglou10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-karatzoglou10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexandros
    family: Karatzoglou
  - given: Alex
    family: Smola
  - given: Markus
    family: Weimer
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 389-396
  id: karatzoglou10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 389
  lastpage: 396
  published: 2010-03-31 00:00:00 +0000
- title: 'Fast Active-set-type Algorithms for $l1$-regularized Linear Regression'
  abstract: 'In this paper, we investigate new active-set-type methods for l1-regularized linear regression that overcome some difficulties of existing active set methods. By showing a relationship between $l1$-regularized linear regression and the linear complementarity problem with bounds, we present a fast active-set-type method, called block principal pivoting. This method accelerates computation by allowing exchanges of several variables among working sets. We further provide an improvement of this method, discuss its properties, and also explain a connection to the structure learning of Gaussian graphical models. Experimental comparisons on synthetic and real data sets show that the proposed method is significantly faster than existing active set methods and competitive against recently developed iterative methods.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/kim10a.html
  PDF: http://proceedings.mlr.press/v9/kim10a/kim10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kim10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jingu
    family: Kim
  - given: Haesun
    family: Park
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 397-404
  id: kim10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 397
  lastpage: 404
  published: 2010-03-31 00:00:00 +0000
- title: 'Online Anomaly Detection under Adversarial Impact'
  abstract: 'Security analysis of learning algorithms is gaining increasing importance, especially since they have become target of deliberate obstruction in certain applications. Some security-hardened algorithms have been previously proposed for supervised learning; however, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution, we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses three key security-related issues: derivation of an optimal attack, analysis of its efficiency and constraints. Experimental evaluation carried out on real HTTP and exploit traces confirms the tightness of our theoretical bounds.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/kloft10a.html
  PDF: http://proceedings.mlr.press/v9/kloft10a/kloft10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kloft10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Marius
    family: Kloft
  - given: Pavel
    family: Laskov
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 405-412
  id: kloft10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 405
  lastpage: 412
  published: 2010-03-31 00:00:00 +0000
- title: 'Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach'
  abstract: 'We propose a novel application of the Simultaneous Orthogonal Matching Pursuit (S-OMP) procedure to perform variable selection in ultra-high dimensional multiple output regression problems, which is the first attempt to utilize multiple outputs to perform fast removal of the irrelevant variables. As our main theoretical contribution, we show that the S-OMP can be used to reduce an ultra-high number of variables to below the sample size, without losing relevant variables.  We also provide formal evidence that the modified Bayesian information criterion (BIC) can be used to efficiently select the number of iterations in the S-OMP. Once the number of variables has been reduced to a manageable size, we show that a more computationally demanding procedure can be used to identify the relevant variables for each of the regression outputs. We further provide evidence on the benefit of variable selection using the regression outputs jointly, as opposed to performing variable selection for each output separately. The finite sample performance of the S-OMP has been demonstrated on extensive simulation studies.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/kolar10a.html
  PDF: http://proceedings.mlr.press/v9/kolar10a/kolar10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kolar10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mladen
    family: Kolar
  - given: Eric
    family: Xing
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 413-420
  id: kolar10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 413
  lastpage: 420
  published: 2010-03-31 00:00:00 +0000
- title: 'Semi-Supervised Learning with Max-Margin Graph Cuts'
  abstract: 'This paper proposes a novel algorithm for semi-supervised learning. This algorithm learns graph cuts that maximize the margin with respect to the labels induced by the harmonic function solution. We motivate the approach, compare it to existing work, and prove a bound on its generalization error. The quality of our solutions is evaluated on a synthetic problem and three UCI ML repository datasets. In most cases, we outperform manifold regularization of support vector machines, which is a state-of-the-art approach to semi-supervised max-margin learning.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/kveton10a.html
  PDF: http://proceedings.mlr.press/v9/kveton10a/kveton10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-kveton10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Branislav
    family: Kveton
  - given: Michal
    family: Valko
  - given: Ali
    family: Rahimi
  - given: Ling
    family: Huang
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 421-428
  id: kveton10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 421
  lastpage: 428
  published: 2010-03-31 00:00:00 +0000
- title: 'Solving the Uncapacitated Facility Location Problem Using Message Passing Algorithms'
  abstract: 'The Uncapacitated Facility Location Problem (UFLP) is one of the most widely studied discrete location problems, whose applications arise in a variety of settings.  We tackle the UFLP using probabilistic inference in a graphical model - an approach that has received little attention in the past.  We show that the fixed points of max-product linear programming (MPLP), a convexified version of the max-product algorithm, can be used to construct a solution with a 3-approximation guarantee for metric UFLP instances.  In addition, we characterize some scenarios under which the MPLP solution is guaranteed to be globally optimal.  We evaluate the performance of both max-sum and MPLP empirically on metric and non-metric problems, demonstrating the advantages of the 3-approximation construction and algorithm applicability to non-metric instances.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/lazic10a.html
  PDF: http://proceedings.mlr.press/v9/lazic10a/lazic10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lazic10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nevena
    family: Lazic
  - given: Brendan
    family: Frey
  - given: Parham
    family: Aarabi
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 429-436
  id: lazic10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 429
  lastpage: 436
  published: 2010-03-31 00:00:00 +0000
- title: 'Relating Function Class Complexity and Cluster Structure in the Function Domain with Applications to Transduction'
  abstract: 'We relate function class complexity to structure in the function domain. This facilitates risk analysis relative to cluster structure in the input space which is particularly effective in semi-supervised learning. In particular we quantify the complexity of function classes defined over a graph in terms of the graph structure.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/lever10a.html
  PDF: http://proceedings.mlr.press/v9/lever10a/lever10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lever10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Guy
    family: Lever
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 437-444
  id: lever10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 437
  lastpage: 444
  published: 2010-03-31 00:00:00 +0000
- title: 'The Feature Selection Path in Kernel Methods'
  abstract: 'The problem of automatic feature selection/weighting in kernel methods is examined. We work on a formulation that optimizes both the weights of features and the parameters of the kernel model simultaneously, using $L_1$ regularization for feature selection. Under quite general choices of kernels, we prove that there exists a unique regularization path for this problem, that runs from 0 to a stationary point of the non-regularized problem. We propose an ODE-based homotopy method to follow this trajectory. By following the path, our algorithm is able to automatically discard irrelevant features and to automatically go back and forth to avoid local optima. Experiments on synthetic and real datasets show that the method achieves low prediction error and is efficient in separating relevant from irrelevant features.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/li10a.html
  PDF: http://proceedings.mlr.press/v9/li10a/li10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-li10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Fuxin
    family: Li
  - given: Cristian
    family: Sminchisescu
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 445-452
  id: li10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 445
  lastpage: 452
  published: 2010-03-31 00:00:00 +0000
- title: 'Simple Exponential Family PCA'
  abstract: 'Bayesian principal component analysis (BPCA), a probabilistic reformulation of PCA with Bayesian model selection, is a systematic approach to determining the number of essential principal components (PCs) for data representation.  However, it assumes that data are Gaussian distributed and thus it cannot handle all types of practical observations, e.g. integers and binary values.  In this paper, we propose simple exponential family PCA (SePCA), a generalised family of probabilistic principal component analysers. SePCA employs exponential family distributions to handle general types of observations. By using Bayesian inference, SePCA also automatically discovers the number of essential PCs. We discuss techniques for fitting the model, develop the corresponding mixture model, and show the effectiveness of the model based on experiments.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/li10b.html
  PDF: http://proceedings.mlr.press/v9/li10b/li10b.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-li10b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jun
    family: Li
  - given: Dacheng
    family: Tao
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 453-460
  id: li10b
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 453
  lastpage: 460
  published: 2010-03-31 00:00:00 +0000
- title: 'The Group Dantzig Selector'
  abstract: 'We introduce a new method – the  group Dantzig selector – for high dimensional sparse regression with  group structure, which has a convincing theory about why utilizing the group structure can be beneficial. Under a  group restricted isometry condition, we obtain a significantly improved nonasymptotic $\ell_2$-norm bound over the basis pursuit or the Dantzig selector which ignores the group structure.   To gain more insight, we also introduce a surprisingly simple and intuitive  “sparsity oracle condition” to obtain a block $\ell_1$-norm bound, which is easily accessible to a broad audience in machine learning community. Encouraging numerical results are also provided to support our theory.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/liu10a.html
  PDF: http://proceedings.mlr.press/v9/liu10a/liu10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-liu10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Han
    family: Liu
  - given: Jian
    family: Zhang
  - given: Xiaoye
    family: Jiang
  - given: Jun
    family: Liu
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 461-468
  id: liu10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 461
  lastpage: 468
  published: 2010-03-31 00:00:00 +0000
- title: 'Descent Methods for Tuning Parameter Refinement'
  abstract: 'This paper addresses multidimensional tuning parameter selection in the context of “train-validate-test” and $K$-fold cross validation. A coarse grid search over tuning parameter space is used to initialize a descent method which then jointly optimizes over variables and tuning parameters.  We study four regularized regression methods and develop the update equations for the corresponding descent algorithms.  Experiments on both simulated and real-world datasets show that the method results in significant tuning parameter refinement.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/lorbert10a.html
  PDF: http://proceedings.mlr.press/v9/lorbert10a/lorbert10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lorbert10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Lorbert
  - given: Peter
    family: Ramadge
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 469-476
  id: lorbert10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 469
  lastpage: 476
  published: 2010-03-31 00:00:00 +0000
- title: 'Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net'
  abstract: 'A new approach to regression regularization called the Pairwise Elastic Net is proposed. Like the Elastic Net, it simultaneously performs automatic variable selection and continuous shrinkage. In addition, the Pairwise Elastic Net encourages the grouping of strongly correlated predictors based on a pairwise similarity measure. We give examples of how the Pairwise Elastic Net can be used to achieve the objectives of Ridge regression, the Lasso, the Elastic Net, and Group Lasso. Finally, we present a coordinate descent algorithm to solve the Pairwise Elastic Net.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/lorbert10b.html
  PDF: http://proceedings.mlr.press/v9/lorbert10b/lorbert10b.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lorbert10b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Lorbert
  - given: David
    family: Eis
  - given: Victoria
    family: Kostina
  - given: David
    family: Blei
  - given: Peter
    family: Ramadge
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 477-484
  id: lorbert10b
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 477
  lastpage: 484
  published: 2010-03-31 00:00:00 +0000
- title: 'Contextual Multi-Armed Bandits'
  abstract: 'We study contextual multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric.  Abstractly, a contextual multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses, based on a given context (side information), an action from a set of possible actions so as to maximize the total payoff of the chosen actions.  The payoff depends on both the action chosen and the context. In contrast, context-free multi-armed bandit problems, a focus of much previous research, model situations where no side information is available and the payoff depends only on the action chosen.  Our problem is motivated by sponsored web search, where the task is to display ads to a user of an Internet search engine based on her search query so as to maximize the click-through rate (CTR) of the ads displayed.  We cast this problem as a contextual multi-armed bandit problem where queries and ads form metric spaces and the payoff function is Lipschitz with respect to both the metrics. For any $\epsilon > 0$ we present an algorithm with regret $O(T^{\frac{a+b+1}{a+b+2} + \epsilon})$ where $a, b$ are the covering dimensions of the query space and the ad space respectively. We prove a lower bound $\Omega(T^{\frac{\tilde{a}+\tilde{b}+1}{\tilde{a}+\tilde{b}+2} - \epsilon})$ for the regret of any algorithm where $\tilde{a}, \tilde{b}$ are packing dimensions of the query spaces and the ad space respectively. For finite spaces or convex bounded subsets of Euclidean spaces, this gives an almost matching upper and lower bound.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/lu10a.html
  PDF: http://proceedings.mlr.press/v9/lu10a/lu10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-lu10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tyler
    family: Lu
  - given: David
    family: Pal
  - given: Martin
    family: Pal
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 485-492
  id: lu10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 485
  lastpage: 492
  published: 2010-03-31 00:00:00 +0000
- title: 'Exploiting Feature Covariance in High-Dimensional Online Learning'
  abstract: 'Some online algorithms for linear classification model the uncertainty in their weights over the course of learning.  Modeling the full covariance structure of the weights can provide a significant advantage for classification.  However, for high-dimensional, large-scale data, even though there may be many second-order feature interactions, it is computationally infeasible to maintain this covariance structure. To extend second-order methods to high-dimensional data, we develop low-rank approximations of the covariance structure. We evaluate our approach on both synthetic and real-world data sets using the confidence-weighted online learning framework. We show improvements over diagonal covariance matrices for both low and high-dimensional data.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/ma10a.html
  PDF: http://proceedings.mlr.press/v9/ma10a/ma10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ma10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Justin
    family: Ma
  - given: Alex
    family: Kulesza
  - given: Mark
    family: Dredze
  - given: Koby
    family: Crammer
  - given: Lawrence
    family: Saul
  - given: Fernando
    family: Pereira
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 493-500
  id: ma10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 493
  lastpage: 500
  published: 2010-03-31 00:00:00 +0000
- title: 'Supervised Dimension Reduction Using Bayesian Mixture Modeling'
  abstract: 'We develop a Bayesian framework for supervised dimension reduction using a flexible nonparametric Bayesian mixture modeling approach. Our method retrieves the dimension reduction or d.r. subspace by utilizing a dependent Dirichlet process that allows for natural clustering for the data in terms of both the response and predictor variables. Formal probabilistic models with likelihoods and priors are given and efficient posterior sampling of the d.r. subspace can be obtained by a Gibbs sampler. As the posterior draws are linear subspaces which are points on a Grassmann manifold, we output the posterior mean d.r. subspace with respect to geodesics on the Grassmannian. The utility of our approach is illustrated on a set of simulated and real examples.  Some Key Words: supervised dimension reduction, inverse regression, Dirichlet process, factor models, Grassman manifold.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/mao10a.html
  PDF: http://proceedings.mlr.press/v9/mao10a/mao10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-mao10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kai
    family: Mao
  - given: Feng
    family: Liang
  - given: Sayan
    family: Mukherjee
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 501-508
  id: mao10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 501
  lastpage: 508
  published: 2010-03-31 00:00:00 +0000
- title: 'Inductive Principles for Restricted Boltzmann Machine Learning'
  abstract: 'Recent research has seen the proposal of several new inductive principles designed specifically to avoid the problems associated with maximum likelihood learning in models with intractable partition functions. In this paper, we study learning methods for binary restricted Boltzmann machines (RBMs) based on ratio matching and generalized score matching. We compare these new RBM learning methods to a range of existing learning methods including stochastic maximum likelihood, contrastive divergence, and pseudo-likelihood. We perform an extensive empirical evaluation across multiple tasks and data sets.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/marlin10a.html
  PDF: http://proceedings.mlr.press/v9/marlin10a/marlin10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-marlin10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Benjamin
    family: Marlin
  - given: Kevin
    family: Swersky
  - given: Bo
    family: Chen
  - given: Nando
    family: Freitas
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 509-516
  id: marlin10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 509
  lastpage: 516
  published: 2010-03-31 00:00:00 +0000
- title: 'Parallelizable Sampling of Markov Random Fields'
  abstract: 'Markov Random Fields (MRFs) are an important class of probabilistic models which are used for density estimation, classification, denoising, and for constructing Deep Belief Networks.  Every application of an MRF requires addressing its inference problem, which can be done using deterministic inference methods or using stochastic Markov Chain Monte Carlo methods.  In this paper we introduce a new Markov Chain transition operator that updates all the variables of a pairwise MRF in parallel by using auxiliary Gaussian variables. The proposed MCMC operator is extremely simple to implement and to parallelize. This is achieved by a formal equivalence result between arbitrary pairwise MRFs and a particular type of Restricted Boltzmann Machine.  This result also implies that the later can be learned in place of the former without any loss of modeling power, a possibility we explore in experiments.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/martens10a.html
  PDF: http://proceedings.mlr.press/v9/martens10a/martens10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-martens10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: James
    family: Martens
  - given: Ilya
    family: Sutskever
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 517-524
  id: martens10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 517
  lastpage: 524
  published: 2010-03-31 00:00:00 +0000
- title: 'Exploiting Within-Clique Factorizations in Junction-Tree Algorithms'
  abstract: 'We show that the expected computational complexity of the Junction-Tree Algorithm for maximum a posteriori inference in graphical models can be improved. Our results apply whenever the potentials over maximal cliques of the triangulated graph are factored over subcliques. This is common in many real applications, as we illustrate with several examples. The new algorithms are easily implemented, and experiments show substantial speed-ups over the classical Junction-Tree Algorithm. This enlarges the class of models for which exact inference is efficient.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/mcauley10a.html
  PDF: http://proceedings.mlr.press/v9/mcauley10a/mcauley10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-mcauley10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Julian
    family: McAuley
  - given: Tiberio
    family: Caetano
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 525-532
  id: mcauley10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 525
  lastpage: 532
  published: 2010-03-31 00:00:00 +0000
- title: 'Discriminative Topic Segmentation of Text and Speech'
  abstract: 'We explore automated discovery of topically-coherent segments in speech or text sequences. We give two new discriminative topic segmentation algorithms which employ a new measure of text similarity based on word co-occurrence. Both algorithms function by finding extrema in the similarity signal over the text, with the latter algorithm using a compact support-vector based description of a window of text or speech observations in word similarity space to overcome noise introduced by speech recognition errors and off-topic content. In experiments over speech and text news streams, we show that these algorithms outperform previous methods. We observe that topic segmentation of speech recognizer output is a more difficult problem than that of text streams; however, we demonstrate that by using a lattice of competing hypotheses rather than just the one-best hypothesis as input to the segmentation algorithm, the performance of the algorithm can be improved.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/mohri10a.html
  PDF: http://proceedings.mlr.press/v9/mohri10a/mohri10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-mohri10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mehryar
    family: Mohri
  - given: Pedro
    family: Moreno
  - given: Eugene
    family: Weinstein
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 533-540
  id: mohri10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 533
  lastpage: 540
  published: 2010-03-31 00:00:00 +0000
- title: 'Elliptical slice sampling'
  abstract: 'Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. We present a new Markov chain Monte Carlo algorithm for performing inference in models with multivariate Gaussian priors. Its key properties are: 1) it has simple, generic code applicable to many models, 2) it has no free parameters, 3) it works well for a variety of Gaussian process based models. These properties make our method ideal for use while model building, removing the need to spend time deriving and tuning updates for more complex algorithms.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/murray10a.html
  PDF: http://proceedings.mlr.press/v9/murray10a/murray10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-murray10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Iain
    family: Murray
  - given: Ryan
    family: Adams
  - given: David
    family: MacKay
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 541-548
  id: murray10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 541
  lastpage: 548
  published: 2010-03-31 00:00:00 +0000
- title: 'Near-Optimal Evasion of Convex-Inducing Classifiers'
  abstract: 'Classifiers are often used to detect miscreant activities. We study how an adversary can efficiently query a classifier to elicit information that allows the adversary to evade detection at near-minimal cost. We generalize results of Lowd and Meek (2005) to convex-inducing classifiers. We present algorithms that construct undetected instances of near-minimal cost using only polynomially many queries in the dimension of the space and without reverse engineering the decision boundary.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/nelson10a.html
  PDF: http://proceedings.mlr.press/v9/nelson10a/nelson10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-nelson10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Blaine
    family: Nelson
  - given: Benjamin
    family: Rubinstein
  - given: Ling
    family: Huang
  - given: Anthony
    family: Joseph
  - given: Shing–hon
    family: Lau
  - given: Steven
    family: Lee
  - given: Satish
    family: Rao
  - given: Anthony
    family: Tran
  - given: Doug
    family: Tygar
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 549-556
  id: nelson10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 549
  lastpage: 556
  published: 2010-03-31 00:00:00 +0000
- title: 'Incremental Sparsification for Real-time Online Model Learning'
  abstract: 'Online model learning in real-time is required by many applications, for example, robot tracking control. It poses a difficult problem, as fast and incremental online regression with large data sets is the essential component and cannot be realized by straightforward usage of off-the-shelf machine learning methods such as Gaussian process regression or support vector regression. In this paper, we propose a framework for online, incremental sparsification with a fixed budget designed for large scale real-time model learning. The proposed approach combines a sparsification method based on an independency measure with a large scale database. In combination with an incremental learning approach such as sequential support vector regression, we obtain a regression method which is applicable in real-time online learning. It exhibits competitive learning accuracy when compared with standard regression techniques. Implementation on a real robot emphasizes the applicability of the proposed approach in real-time online model learning for real world systems.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/nguyen_tuong10a.html
  PDF: http://proceedings.mlr.press/v9/nguyen_tuong10a/nguyen_tuong10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-nguyen_tuong10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Duy
    family: Nguyen–Tuong
  - given: Jan
    family: Peters
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 557-564
  id: nguyen_tuong10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 557
  lastpage: 564
  published: 2010-03-31 00:00:00 +0000
- title: 'Fluid Dynamics Models for Low Rank Discriminant Analysis'
  abstract: 'We consider the problem of reducing the dimensionality of labeled data for classification. Unfortunately, the optimal approach of finding the low-dimensional projection with minimal Bayes classification error is intractable, so most standard algorithms optimize a tractable heuristic function in the projected subspace. Here, we investigate a physics-based model where we consider the labeled data as interacting fluid distributions. We derive the forces arising in the fluids from information theoretic potential functions, and consider appropriate low rank constraints on the resulting acceleration and velocity flow fields.  We show how to apply the Gauss principle of least constraint in fluids to obtain tractable solutions for low rank projections. Our fluid dynamic approach is demonstrated to better approximate the Bayes optimal solution on Gaussian systems, including infinite dimensional Gaussian processes.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/noh10a.html
  PDF: http://proceedings.mlr.press/v9/noh10a/noh10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-noh10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yung–Kyun
    family: Noh
  - given: Byoung–Tak
    family: Zhang
  - given: Daniel
    family: Lee
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 565-572
  id: noh10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 565
  lastpage: 572
  published: 2010-03-31 00:00:00 +0000
- title: 'Approximation of hidden Markov models by mixtures of experts with application to particle filtering'
  abstract: 'Selecting conveniently the proposal kernel and the adjustment multiplier weights of the auxiliary particle filter may increase significantly the accuracy and computational efficiency of the method. However, in practice the optimal proposal kernel and multiplier weights are seldom known. In this paper we present a simulation-based method for constructing offline an approximation of these quantities that makes the filter close to fully adapted at a reasonable computational cost. The approximation is constructed as a mixture of experts optimised through an efficient stochastic approximation algorithm. The method is illustrated on two simulated examples.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/olsson10a.html
  PDF: http://proceedings.mlr.press/v9/olsson10a/olsson10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-olsson10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jimmy
    family: Olsson
  - given: Jonas
    family: Ströjby
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 573-580
  id: olsson10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 573
  lastpage: 580
  published: 2010-03-31 00:00:00 +0000
- title: 'A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection'
  abstract: 'We propose a generalization of the Multiple-try Metropolis (MTM) algorithm of Liu et al. (2000), which is based on drawing several proposals at each step and randomly choosing one of them on the basis of weights that may be arbitrary chosen. In particular, for Bayesian estimation we also introduce a method based on weights depending on a quadratic approximation of the posterior distribution. The resulting algorithm cannot be reformulated as an MTM algorithm and leads to a comparable gain of efficiency with a lower computational effort. We also outline the extension of the proposed strategy, and then of the MTM strategy, to Bayesian model selection, casting it in a Reversible Jump framework. The approach is illustrated by real examples.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/pandolfi10a.html
  PDF: http://proceedings.mlr.press/v9/pandolfi10a/pandolfi10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-pandolfi10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Silvia
    family: Pandolfi
  - given: Francesco
    family: Bartolucci
  - given: Nial
    family: Friel
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 581-588
  id: pandolfi10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 581
  lastpage: 588
  published: 2010-03-31 00:00:00 +0000
- title: 'Bayesian structure discovery in Bayesian networks with less space'
  abstract: 'Current exact algorithms for score-based structure discovery in Bayesian networks on $n$ nodes run in time and space within a polynomial factor of $2^n$. For practical use, the space requirement is the bottleneck, which motivates trading space against time. Here, previous results on finding an optimal network structure in less space are extended in two directions. First, we consider the problem of computing the posterior probability of a given arc set. Second, we operate with the general partial order framework and its specialization to bucket orders,  introduced recently for related permutation problems. The main technical contribution is the development of a fast algorithm for a novel zeta transform variant, which may be of independent interest.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/parviainen10a.html
  PDF: http://proceedings.mlr.press/v9/parviainen10a/parviainen10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-parviainen10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pekka
    family: Parviainen
  - given: Mikko
    family: Koivisto
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 589-596
  id: parviainen10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 589
  lastpage: 596
  published: 2010-03-31 00:00:00 +0000
- title: 'Identifying Cause and Effect on Discrete Data using Additive Noise Models'
  abstract: 'Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. Recently, methods using additive noise models have been suggested to approach the case of continuous variables. In many situations, however, the variables of interest are discrete or even have only finitely many states. In this work we extend the notion of additive noise models to these cases. Whenever the joint distribution $\mathbf{P}^{(X,Y)}$ admits such a model in one direction, e.g. $Y = f(X)+N$, $N \perp \!\!\! \perp X$, it does not admit the reversed model  $X=g(Y)+\tilde{N}$, $\tilde{N} \perp \!\!\! \perp Y$ as long as the model is chosen in a generic way. Based on these deliberations we propose an efficient new algorithm that is able to distinguish between cause and effect for a finite sample of discrete variables. We show that this algorithm works both on synthetic and real data sets.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/peters10a.html
  PDF: http://proceedings.mlr.press/v9/peters10a/peters10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-peters10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jonas
    family: Peters
  - given: Dominik
    family: Janzing
  - given: Bernhard
    family: Schölkopf
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 597-604
  id: peters10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 597
  lastpage: 604
  published: 2010-03-31 00:00:00 +0000
- title: 'REGO: Rank-based Estimation of Renyi Information using Euclidean Graph Optimization'
  abstract: 'We propose a new method for a non-parametric estimation of Renyi and Shannon information for a multivariate distribution using a corresponding copula, a multivariate distribution over normalized ranks of the data. As the information of the distribution is the same as the negative entropy of its copula, our method estimates this information by solving a Euclidean graph optimization problem on the empirical estimate of the distribution’s copula. Owing to the properties of the copula, we show that the resulting estimator of Renyi information is strongly consistent and robust. Further, we demonstrate its applicability in the image registration in addition to simulated experiments.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/poczos10a.html
  PDF: http://proceedings.mlr.press/v9/poczos10a/poczos10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-poczos10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Barnabas
    family: Poczos
  - given: Sergey
    family: Kirshner
  - given: Csaba
    family: Szepesvári
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 605-612
  id: poczos10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 605
  lastpage: 612
  published: 2010-03-31 00:00:00 +0000
- title: 'Infinite Predictor Subspace Models for Multitask Learning'
  abstract: 'Given several related learning tasks, we propose a nonparametric Bayesian model that captures task relatedness by assuming that the task parameters (i.e., predictors) share a latent subspace. More specifically, the intrinsic dimensionality of the task subspace is not assumed to be known a priori. We use an infinite latent feature model to automatically infer this number (depending on and limited by only the number of tasks). Furthermore, our approach is applicable when the underlying task parameter subspace is inherently sparse, drawing parallels with l1 regularization and LASSO-style models. We also propose an augmented model which can make use of (labeled, and additionally unlabeled if available) inputs to assist learning this subspace, leading to further improvements in the performance. Experimental results demonstrate the efficacy of both the proposed approaches, especially when the number of examples per task is small. Finally, we discuss an extension of the proposed framework where a nonparametric mixture of linear subspaces can be used to learn a manifold over the task parameters, and also deal with the issue of negative transfer from unrelated tasks.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/rai10a.html
  PDF: http://proceedings.mlr.press/v9/rai10a/rai10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-rai10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Piyush
    family: Rai
  - given: Hal
    family: Daumé
    suffix: III
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 613-620
  id: rai10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 613
  lastpage: 620
  published: 2010-03-31 00:00:00 +0000
- title: 'Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images'
  abstract: 'Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The Gaussian-Binary RBMs that have been used to model real-valued data are not a good way to model the covariance structure of natural images. We propose a factored 3-way RBM that uses the states of its hidden units to represent abnormalities in the local covariance structure of an image. This provides a probabilistic framework for the widely used simple/complex cell architecture. Our model learns binary features that work very well for object recognition on the “tiny images” data set. Even better features are obtained by then using standard binary RBM’s to learn a deeper model.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/ranzato10a.html
  PDF: http://proceedings.mlr.press/v9/ranzato10a/ranzato10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ranzato10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Marc’Aurelio
    family: Ranzato
  - given: Alex
    family: Krizhevsky
  - given: Geoffrey
    family: Hinton
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 621-628
  id: ranzato10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 621
  lastpage: 628
  published: 2010-03-31 00:00:00 +0000
- title: 'Nonparametric prior for adaptive sparsity'
  abstract: 'For high-dimensional problems various parametric priors have been proposed to promote sparse solutions. While parametric  priors has shown considerable success they are not very robust in adapting to varying degrees of sparsity. In this work we propose a discrete mixture prior which is partially nonparametric. The right structure for the prior and the amount of sparsity is estimated directly from the data. Our experiments show that the proposed prior adapts to sparsity much better than its parametric counterparts. We  apply the proposed method to classification of high dimensional microarray datasets.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/raykar10a.html
  PDF: http://proceedings.mlr.press/v9/raykar10a/raykar10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-raykar10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vikas
    family: Raykar
  - given: Linda
    family: Zhao
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 629-636
  id: raykar10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 629
  lastpage: 636
  published: 2010-03-31 00:00:00 +0000
- title: 'Convexity of Proper Composite Binary Losses'
  abstract: 'A composite loss assigns a penalty to a real-valued prediction by associating the prediction with a probability via a link function then applying a class probability estimation (CPE) loss. If the risk for a composite loss is always minimised by predicting the value associated with the true class probability the composite loss is proper. We provide a novel, explicit and complete characterisation of the convexity of any proper composite loss in terms of its link and its “weight function” associated with its proper CPE loss.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/reid10a.html
  PDF: http://proceedings.mlr.press/v9/reid10a/reid10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-reid10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mark
    family: Reid
  - given: Robert
    family: Williamson
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 637-644
  id: reid10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 637
  lastpage: 644
  published: 2010-03-31 00:00:00 +0000
- title: 'Gaussian processes with monotonicity information'
  abstract: 'A method for using monotonicity information in multivariate Gaussian process regression and classification is proposed. Monotonicity information is introduced with virtual derivative observations, and the resulting posterior is approximated with expectation propagation. Behaviour of the method is illustrated with artificial regression examples, and the method is used in a real world health care classification problem to include monotonicity information with respect to one of the covariates.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/riihimaki10a.html
  PDF: http://proceedings.mlr.press/v9/riihimaki10a/riihimaki10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-riihimaki10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jaakko
    family: Riihimäki
  - given: Aki
    family: Vehtari
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 645-652
  id: riihimaki10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 645
  lastpage: 652
  published: 2010-03-31 00:00:00 +0000
- title: 'A Regularization Approach to Nonlinear Variable Selection'
  abstract: 'In this paper we consider a regularization approach to variable selection when the regression function depends nonlinearly on a few input variables. The proposed method is based on a regularized least square estimator penalizing large values of the partial derivatives. An efficient iterative procedure is proposed to solve the underlying variational problem, and its convergence is proved. The empirical properties of the obtained estimator are tested both for prediction and variable selection. The algorithm compares favorably to more standard ridge regression and L1 regularization schemes.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/rosasco10a.html
  PDF: http://proceedings.mlr.press/v9/rosasco10a/rosasco10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-rosasco10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lorenzo
    family: Rosasco
  - given: Matteo
    family: Santoro
  - given: Sofia
    family: Mosci
  - given: Alessandro
    family: Verri
  - given: Silvia
    family: Villa
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 653-660
  id: rosasco10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 653
  lastpage: 660
  published: 2010-03-31 00:00:00 +0000
- title: 'Efficient Reductions for Imitation Learning'
  abstract: 'Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d..  This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/ross10a.html
  PDF: http://proceedings.mlr.press/v9/ross10a/ross10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ross10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Stephane
    family: Ross
  - given: Drew
    family: Bagnell
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 661-668
  id: ross10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 661
  lastpage: 668
  published: 2010-03-31 00:00:00 +0000
- title: 'Approximate parameter inference in a stochastic reaction-diffusion model'
  abstract: 'We present an approximate inference approach to parameter estimation in a spatio-temporal stochastic process of the reaction-diffusion type. The continuous space limit of an inference method for Markov jump processes leads to an approximation which is related to a spatial Gaussian process. An efficient solution in feature space using a Fourier basis is applied to inference on simulational data.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/ruttor10a.html
  PDF: http://proceedings.mlr.press/v9/ruttor10a/ruttor10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ruttor10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Andreas
    family: Ruttor
  - given: Manfred
    family: Opper
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 669-676
  id: ruttor10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 669
  lastpage: 676
  published: 2010-03-31 00:00:00 +0000
- title: 'Active Sequential Learning with Tactile Feedback'
  abstract: 'We consider the problem of tactile discrimination, with the goal of estimating an underlying state parameter in a sequential setting. If the data is continuous and high-dimensional, collecting enough representative data samples becomes difficult. We present a framework that uses active learning to help with the sequential gathering of data samples, using information-theoretic criteria to find optimal actions at each time step. We consider two approaches to recursively update the state parameter belief: an analytical Gaussian approximation and a Monte Carlo sampling method. We show how both active frameworks improve convergence, demonstrating results on a real robotic hand-arm system that estimates the viscosity of liquids from tactile feedback data.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/saal10a.html
  PDF: http://proceedings.mlr.press/v9/saal10a/saal10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-saal10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hannes
    family: Saal
  - given: Jo–Anne
    family: Ting
  - given: Sethu
    family: Vijayakumar
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 677-684
  id: saal10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 677
  lastpage: 684
  published: 2010-03-31 00:00:00 +0000
- title: 'Reducing Label Complexity by Learning From Bags'
  abstract: 'We consider a supervised learning setting in which the main cost of learning is the number of training labels and one can obtain a single label for a bag of examples, indicating only if a positive example exists in the bag, as in Multi-Instance Learning. We thus propose to create a training sample of bags, and to use the obtained labels to learn to classify individual examples. We provide a theoretical analysis showing how to select the bag size as a function of the problem parameters, and prove that if the original labels are distributed unevenly, the number of required labels drops considerably when learning from bags. We demonstrate that finding a low-error separating hyperplane from bags is feasible in this setting using a simple iterative procedure similar to latent SVM. Experiments on synthetic and real data sets demonstrate the success of the approach.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/sabato10a.html
  PDF: http://proceedings.mlr.press/v9/sabato10a/sabato10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sabato10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sivan
    family: Sabato
  - given: Nathan
    family: Srebro
  - given: Naftali
    family: Tishby
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 685-692
  id: sabato10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 685
  lastpage: 692
  published: 2010-03-31 00:00:00 +0000
- title: 'Efficient Learning of Deep Boltzmann Machines'
  abstract: 'We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model that is used to quickly initialize, in a single bottom-up pass, the values of the latent variables in all hidden layers. We show that using such a recognition model, followed by a combined top-down and bottom-up pass, it is possible to efficiently learn a good generative model of high-dimensional highly-structured sensory input. We show that the additional computations required by incorporating a top-down feedback plays a critical role in the performance of a DBM, both as a generative and discriminative model. Moreover, inference is only at most three times slower compared to the approximate inference in a Deep Belief Network (DBN), making large-scale learning of DBM’s practical. Finally, we demonstrate that the DBM’s trained using the proposed approximate inference algorithm perform well compared to DBN’s and SVM’s on the MNIST handwritten digit, OCR English letters, and NORB visual object recognition tasks.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/salakhutdinov10a.html
  PDF: http://proceedings.mlr.press/v9/salakhutdinov10a/salakhutdinov10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-salakhutdinov10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ruslan
    family: Salakhutdinov
  - given: Hugo
    family: Larochelle
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 693-700
  id: salakhutdinov10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 693
  lastpage: 700
  published: 2010-03-31 00:00:00 +0000
- title: 'Factorized Orthogonal Latent Spaces'
  abstract: 'Existing approaches to multi-view learning are particularly effective when the views are either independent (i.e, multi-kernel approaches) or fully dependent (i.e., shared latent spaces). However, in real scenarios, these assumptions are almost never truly satisfied. Recently, two methods have attempted to tackle this problem by factorizing the information and learn separate latent spaces for modeling the shared (i.e., correlated) and private (i.e., independent) parts of the data. However, these approaches are very sensitive to parameters setting or initialization. In this paper we propose a robust approach to factorizing the latent space into shared and private spaces by introducing orthogonality constraints, which penalize redundant latent representations. Furthermore, unlike previous approaches, we simultaneously learn the structure and dimensionality of the latent spaces by relying on a regularizer that encourages the latent space of each data stream to be low dimensional. To demonstrate the benefits of our approach, we apply it to two existing shared latent space models that assume full dependence of the views, the sGPLVM and the sKIE, and show that our constraints improve the performance of these models on the task of pose estimation from monocular images.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/salzmann10a.html
  PDF: http://proceedings.mlr.press/v9/salzmann10a/salzmann10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-salzmann10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mathieu
    family: Salzmann
  - given: Carl Henrik
    family: Ek
  - given: Raquel
    family: Urtasun
  - given: Trevor
    family: Darrell
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 701-708
  id: salzmann10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 701
  lastpage: 708
  published: 2010-03-31 00:00:00 +0000
- title: 'Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials'
  abstract: 'Previous work has examined structure learning in log-linear models with $\ell_1$-regularization, largely focusing on the case of pairwise potentials.  In this work we consider the case of models with potentials of arbitrary order, but that satisfy a hierarchical constraint.  We enforce the hierarchical constraint using group $\ell_1$-regularization with overlapping groups, and an active set method that enforces hierarchical inclusion allows us to tractably consider the exponential number of higher-order potentials.  We use a spectral projected gradient method as a sub-routine for solving the overlapping group $\ell_1$-regularization problem, and make use of a sparse version of Dykstra’s algorithm to compute the projection.  Our experiments indicate that this model gives equal or better test set likelihood compared to previous models.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/schmidt10a.html
  PDF: http://proceedings.mlr.press/v9/schmidt10a/schmidt10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-schmidt10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mark
    family: Schmidt
  - given: Kevin
    family: Murphy
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 709-716
  id: schmidt10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 709
  lastpage: 716
  published: 2010-03-31 00:00:00 +0000
- title: 'Polynomial-Time Exact Inference in NP-Hard Binary MRFs via Reweighted Perfect Matching'
  abstract: 'We develop a new form of reweighting (Wainwright et al., 2005b) to leverage the relationship between Ising spin glasses and perfect matchings into a novel technique for the exact computation of MAP states in hitherto intractable binary Markov random fields. Our method solves an $n \times n$ lattice with external field and random couplings much faster, and for larger $n$, than the best competing algorithms. It empirically scales as $O(n^3)$ even though this problem is NP-hard and non-approximable in polynomial time. We discuss limitations of our current implementation and propose ways to overcome them.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/schraudolph10a.html
  PDF: http://proceedings.mlr.press/v9/schraudolph10a/schraudolph10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-schraudolph10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nic
    family: Schraudolph
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 717-724
  id: schraudolph10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 717
  lastpage: 724
  published: 2010-03-31 00:00:00 +0000
- title: 'Dense Message Passing for Sparse Principal Component Analysis'
  abstract: 'We describe a novel inference algorithm for sparse Bayesian PCA with a  zero-norm prior on the model parameters. Bayesian inference is very  challenging in probabilistic models of this type. MCMC procedures are  too slow to be practical in a very high-dimensional setting and standard  mean-field variational Bayes algorithms are ineffective.  We adopt a  dense message passing algorithm similar to algorithms developed in the  statistical  physics community and previously applied to inference  problems in coding and sparse classification. The algorithm achieves  near-optimal performance on synthetic data for which a statistical  mechanics theory of optimal learning can be derived. We also study two  gene expression datasets used in previous studies of sparse PCA. We find our  method  performs better than one published algorithm and comparably to a second.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/sharp10a.html
  PDF: http://proceedings.mlr.press/v9/sharp10a/sharp10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sharp10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kevin
    family: Sharp
  - given: Magnus
    family: Rattray
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 725-732
  id: sharp10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 725
  lastpage: 732
  published: 2010-03-31 00:00:00 +0000
- title: 'Empirical Bernstein Boosting'
  abstract: 'Concentration inequalities that incorporate variance information (such as Bernstein’s or Bennett’s inequality) are often significantly tighter than counterparts (such as Hoeffding’s inequality) that disregard variance. Nevertheless, many state of the art machine learning algorithms for classification problems like AdaBoost and support vector machines (SVMs) extensively use Hoeffding’s inequalities to justify empirical risk minimization and its variants. This article proposes a novel boosting algorithm based on a recently introduced principle–sample variance penalization–which is motivated from an empirical version of Bernstein’s inequality.  This framework leads to an efficient algorithm that is as easy to implement as AdaBoost while producing a strict generalization. Experiments on a large number of datasets show significant performance gains over AdaBoost. This paper shows that sample variance penalization could be a viable alternative to empirical risk minimization.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/shivaswamy10a.html
  PDF: http://proceedings.mlr.press/v9/shivaswamy10a/shivaswamy10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-shivaswamy10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pannagadatta
    family: Shivaswamy
  - given: Tony
    family: Jebara
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 733-740
  id: shivaswamy10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 733
  lastpage: 740
  published: 2010-03-31 00:00:00 +0000
- title: 'Reduced-Rank Hidden Markov Models'
  abstract: 'Hsu et al. (2009) recently proposed an efficient, accurate spectral learning algorithm for Hidden Markov Models (HMMs). In this paper we relax their assumptions and prove a tighter finite-sample error bound for the case of Reduced-Rank HMMs, i.e., HMMs with low-rank transition matrices. Since rank-$k$ RR-HMMs are a larger class of models than $k$-state HMMs while being equally efficient to work with, this relaxation greatly increases the learning algorithm’s scope. In addition, we generalize the algorithm and bounds to models where multiple observations are needed to disambiguate state, and to models that emit multivariate real-valued observations. Finally we prove consistency for learning Predictive State Representations, an even larger class of models. Experiments on synthetic data and a toy video, as well as on difficult robot vision data, yield accurate models that compare favorably with alternatives in simulation quality and prediction accuracy.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/siddiqi10a.html
  PDF: http://proceedings.mlr.press/v9/siddiqi10a/siddiqi10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-siddiqi10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sajid
    family: Siddiqi
  - given: Byron
    family: Boots
  - given: Geoffrey
    family: Gordon
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 741-748
  id: siddiqi10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 741
  lastpage: 748
  published: 2010-03-31 00:00:00 +0000
- title: 'Detecting Weak but Hierarchically-Structured Patterns in Networks'
  abstract: 'The ability to detect weak distributed activation patterns in networks is critical to several applications, such as identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network.  This is a challenging problem since weak distributed patterns can be invisible in per node statistics as well as a global network-wide aggregate. Most prior work considers situations in which the activation/non-activation of each node is statistically independent, but this is unrealistic in many problems.  In this paper, we consider structured patterns arising from statistical dependencies in the activation process.  Our contributions are three-fold. First, we propose a sparsifying transform that succinctly represents structured activation patterns that conform to a hierarchical dependency graph.  Second, we establish that the proposed transform facilitates detection of very weak activation patterns that cannot be detected with existing methods. Third, we show that the structure of the hierarchical dependency graph governing the activation process, and hence the network transform, can be learnt from very few (logarithmic in network size) independent snapshots of network activity.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/singh10a.html
  PDF: http://proceedings.mlr.press/v9/singh10a/singh10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-singh10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Aarti
    family: Singh
  - given: Robert
    family: Nowak
  - given: Robert
    family: Calderbank
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 749-756
  id: singh10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 749
  lastpage: 756
  published: 2010-03-31 00:00:00 +0000
- title: 'Inference of Sparse Networks with Unobserved Variables. Application to Gene Regulatory Networks'
  abstract: 'Networks are becoming a unifying framework for modeling complex systems and network inference problems are frequently encountered in many fields. Here, I develop and apply a generative approach to network inference (RCweb) for the case when the network is sparse and the latent (not observed) variables affect the observed ones. From all possible factor analysis (FA) decompositions explaining the variance in the data, RCweb selects the FA decomposition that is consistent with a sparse underlying network. The sparsity constraint is imposed by a novel method that significantly outperforms (in terms of accuracy, robustness to noise, complexity scaling and computational efficiency) methods using $\ell 1$ norm relaxation such as K-SVD and $\ell 1$-based sparse principle component analysis (PCA). Results from simulated models demonstrate that RCweb recovers exactly the model structures for sparsity as low (as non-sparse) as 50% and with ratio of unobserved to observed variables as high as 2. RCweb is robust to noise, with gradual decrease in the parameter ranges as the noise level increases.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/slavov10a.html
  PDF: http://proceedings.mlr.press/v9/slavov10a/slavov10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-slavov10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nikolai
    family: Slavov
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 757-764
  id: slavov10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 757
  lastpage: 764
  published: 2010-03-31 00:00:00 +0000
- title: 'Nonparametric Tree Graphical Models'
  abstract: 'We introduce a nonparametric representation for graphical model on trees which expresses  marginals as Hilbert space embeddings and conditionals as embedding operators.  This formulation allows us to define a graphical model solely  on the basis of the feature space representation of its variables. Thus, this nonparametric model can be applied to general domains where kernels are defined, handling challenging cases such as discrete variables whose domains are huge, or very complex, non-Gaussian continuous distributions. We also derive kernel belief propagation, a Hilbert-space algorithm for performing inference in our model.  We show that our method outperforms state-of-the-art techniques  in a cross-lingual document retrieval task and  a camera rotation estimation problem.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/song10a.html
  PDF: http://proceedings.mlr.press/v9/song10a/song10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-song10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Le
    family: Song
  - given: Arthur
    family: Gretton
  - given: Carlos
    family: Guestrin
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 765-772
  id: song10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 765
  lastpage: 772
  published: 2010-03-31 00:00:00 +0000
- title: 'On the relation between universality, characteristic kernels and RKHS embedding of measures'
  abstract: 'Universal kernels have been shown to play an important role in the achievability of the Bayes risk by many kernel-based algorithms that include binary classification, regression, etc. In this paper, we propose a notion of universality that generalizes the notions introduced by Steinwart and Micchelli et al. and study the necessary and sufficient conditions for a kernel to be universal. We show that all these notions of universality are closely linked to the injective embedding of a certain class of Borel measures into a reproducing kernel Hilbert space (RKHS). By exploiting this relation between universality and the embedding of Borel measures into an RKHS, we establish the relation between universal and characteristic kernels. The latter have been proposed in the context of the RKHS embedding of probability measures, used in statistical applications like homogeneity testing, independence testing, etc.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/sriperumbudur10a.html
  PDF: http://proceedings.mlr.press/v9/sriperumbudur10a/sriperumbudur10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sriperumbudur10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Bharath
    family: Sriperumbudur
  - given: Kenji
    family: Fukumizu
  - given: Gert
    family: Lanckriet
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 773-780
  id: sriperumbudur10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 773
  lastpage: 780
  published: 2010-03-31 00:00:00 +0000
- title: 'Conditional Density Estimation via Least-Squares Density Ratio Estimation'
  abstract: 'Estimating the conditional mean of an input-output relation is the goal of regression. However, regression analysis is not sufficiently informative if the conditional distribution has multi-modality, is highly asymmetric, or contains heteroscedastic noise. In such scenarios, estimating the conditional distribution itself would be more useful. In this paper, we propose a novel method of conditional density estimation that is suitable for multi-dimensional continuous variables. The basic idea of the proposed method is to express the conditional density in terms of the density ratio and the ratio is directly estimated without going through density estimation. Experiments using benchmark and robot transition datasets illustrate the usefulness of the proposed approach.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/sugiyama10a.html
  PDF: http://proceedings.mlr.press/v9/sugiyama10a/sugiyama10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sugiyama10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Masashi
    family: Sugiyama
  - given: Ichiro
    family: Takeuchi
  - given: Taiji
    family: Suzuki
  - given: Takafumi
    family: Kanamori
  - given: Hirotaka
    family: Hachiya
  - given: Daisuke
    family: Okanohara
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 781-788
  id: sugiyama10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 781
  lastpage: 788
  published: 2010-03-31 00:00:00 +0000
- title: 'On the Convergence Properties of Contrastive Divergence'
  abstract: 'Contrastive Divergence (CD) is a popular method for estimating the parameters of Markov Random Fields (MRFs) by rapidly approximating an intractable term in the gradient of the log probability. Despite CD’s empirical success, little is known about its theoretical convergence properties. In this paper, we analyze the CD$_1$ update rule for Restricted Boltzmann Machines (RBMs) with binary variables. We show that this update is not the gradient of any function, and construct a counterintuitive “regularization function” that causes CD learning to cycle indefinitely.  Nonetheless, we show that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/sutskever10a.html
  PDF: http://proceedings.mlr.press/v9/sutskever10a/sutskever10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sutskever10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ilya
    family: Sutskever
  - given: Tijmen
    family: Tieleman
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 789-795
  id: sutskever10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 789
  lastpage: 795
  published: 2010-03-31 00:00:00 +0000
- title: 'Inference and Learning in Networks of Queues'
  abstract: 'Probabilistic models of the performance of computer systems are useful both for predicting system performance in new conditions, and for diagnosing past performance problems. The most popular performance models are networks of queues. However, no current methods exist for parameter estimation or inference in networks of queues with missing data. In this paper, we present a novel viewpoint that combines queueing networks and graphical models, allowing Markov chain Monte Carlo to be applied. We demonstrate the effectiveness of our sampler on real-world data from a benchmark Web application.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/sutton10a.html
  PDF: http://proceedings.mlr.press/v9/sutton10a/sutton10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-sutton10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Charles
    family: Sutton
  - given: Michael I.
    family: Jordan
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 796-803
  id: sutton10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 796
  lastpage: 803
  published: 2010-03-31 00:00:00 +0000
- title: 'Sufficient Dimension Reduction via Squared-loss Mutual Information Estimation'
  abstract: 'The goal of sufficient dimension reduction in supervised learning is to find the low dimensional subspace of input features that is "sufficient" for predicting output values. In this paper, we propose a novel sufficient dimension reduction method using a squared-loss variant of mutual information as a dependency measure. We utilize an analytic approximator of squared-loss mutual information based on density ratio estimation, which is shown to possess suitable convergence properties. We then develop a natural gradient algorithm for sufficient subspace search. Numerical experiments show that the proposed method compares favorably with existing dimension reduction approaches.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/suzuki10a.html
  PDF: http://proceedings.mlr.press/v9/suzuki10a/suzuki10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-suzuki10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Taiji
    family: Suzuki
  - given: Masashi
    family: Sugiyama
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 804-811
  id: suzuki10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 804
  lastpage: 811
  published: 2010-03-31 00:00:00 +0000
- title: 'HOP-MAP: Efficient Message Passing with High Order Potentials'
  abstract: 'There is a growing interest in building probabilistic models with high order potentials (HOPs), or interactions, among discrete variables.  Message passing inference in such models generally takes time exponential in the size of the interaction, but in some cases maximum a posteriori (MAP) inference can be carried out efficiently.   We build upon such results,  introducing two new classes, including composite HOPs that allow us to flexibly combine tractable HOPs using simple logical switching rules. We present efficient message update algorithms for the new HOPs, and we improve upon the efficiency of message updates for a general class of existing HOPs. Importantly, we present both new and existing HOPs in a common representation;  performing inference with any combination of these HOPs requires no change of representations or new derivations.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/tarlow10a.html
  PDF: http://proceedings.mlr.press/v9/tarlow10a/tarlow10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-tarlow10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: Tarlow
  - given: Inmar
    family: Givoni
  - given: Richard
    family: Zemel
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 812-819
  id: tarlow10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 812
  lastpage: 819
  published: 2010-03-31 00:00:00 +0000
- title: 'Hartigan’s Method: k-means Clustering without Voronoi'
  abstract: 'Hartigan’s method for $k$-means clustering is the following greedy heuristic: select a point, and optimally reassign it.  This paper develops two other formulations of the heuristic, one leading to a number of consistency properties, the other showing that the data partition is always quite separated from the induced Voronoi partition.  A characterization of the volume of this separation is provided.  Empirical tests verify not only good optimization performance relative to Lloyd’s method, but also good running time.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/telgarsky10a.html
  PDF: http://proceedings.mlr.press/v9/telgarsky10a/telgarsky10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-telgarsky10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matus
    family: Telgarsky
  - given: Andrea
    family: Vattani
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 820-827
  id: telgarsky10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 820
  lastpage: 827
  published: 2010-03-31 00:00:00 +0000
- title: 'Learning Policy Improvements with Path Integrals'
  abstract: 'With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parametrized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured.   Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems.  We believe that Policy  Improvement with Path Integrals (PI$^2$) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/theodorou10a.html
  PDF: http://proceedings.mlr.press/v9/theodorou10a/theodorou10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-theodorou10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Evangelos
    family: Theodorou
  - given: Jonas
    family: Buchli
  - given: Stefan
    family: Schaal
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 828-835
  id: theodorou10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 828
  lastpage: 835
  published: 2010-03-31 00:00:00 +0000
- title: 'Unsupervised Aggregation for Classification Problems with Large Numbers of Categories'
  abstract: 'Classification problems with a very large or unbounded set of output categories are common in many areas such as natural language and image processing. In order to improve accuracy on these tasks, it is natural for a  decision-maker to  combine predictions from various  sources.  However, supervised data needed to fit an aggregation model  is often difficult to obtain, especially if needed for multiple domains. Therefore, we propose a generative model for unsupervised aggregation which exploits the agreement signal to estimate the expertise of individual judges.  Due to the large output space size, this aggregation model cannot encode expertise of constituent judges with respect to every category for all problems. Consequently, we extend it by incorporating the notion of category types  to account for variability  of the judge expertise depending on the type.  The viability of our approach is demonstrated both on synthetic experiments and on a practical task of syntactic parser aggregation.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/titov10a.html
  PDF: http://proceedings.mlr.press/v9/titov10a/titov10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-titov10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ivan
    family: Titov
  - given: Alexandre
    family: Klementiev
  - given: Kevin
    family: Small
  - given: Dan
    family: Roth
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 836-843
  id: titov10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 836
  lastpage: 843
  published: 2010-03-31 00:00:00 +0000
- title: 'Bayesian Gaussian Process Latent Variable Model'
  abstract: 'We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/titsias10a.html
  PDF: http://proceedings.mlr.press/v9/titsias10a/titsias10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-titsias10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Michalis
    family: Titsias
  - given: Neil D.
    family: Lawrence
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 844-851
  id: titsias10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 844
  lastpage: 851
  published: 2010-03-31 00:00:00 +0000
- title: 'A Markov-Chain Monte Carlo Approach to Simultaneous Localization and Mapping'
  abstract: 'A Markov-Chain Monte Carlo based algorithm is provided to solve the simultaneous localization and mapping (SLAM) problem with general dynamical and observation models under open-loop control and provided that the map-representation is finite dimensional. To our knowledge this is the first provably consistent yet (close-to) practical solution to this problem. The superiority of our algorithm over alternative SLAM algorithms is demonstrated in a difficult loop closing situation.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/torma10a.html
  PDF: http://proceedings.mlr.press/v9/torma10a/torma10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-torma10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Peter
    family: Torma
  - given: András
    family: György
  - given: Csaba
    family: Szepesvári
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 852-859
  id: torma10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 852
  lastpage: 859
  published: 2010-03-31 00:00:00 +0000
- title: 'Learning Causal Structure from Overlapping Variable Sets'
  abstract: 'We present an algorithm name cSAT+ for learning the causal structure in a domain from datasets measuring different variables sets. The algorithm outputs a graph with edges corresponding to all possible pairwise causal relations between two variables, named Pairwise Causal Graph (PCG). Examples of interesting inferences include the induction of the absence or presence of some causal relation between two variables never measured together. cSAT+ converts the problem to a series of SAT problems, obtaining leverage from the efficiency of state-of-the-art solvers. In our empirical evaluation, it is shown to outperform ION, the first algorithm solving a similar but more general problem, by two orders of magnitude.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/triantafillou10a.html
  PDF: http://proceedings.mlr.press/v9/triantafillou10a/triantafillou10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-triantafillou10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sofia
    family: Triantafillou
  - given: Ioannis
    family: Tsamardinos
  - given: Ioannis
    family: Tollis
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 860-867
  id: triantafillou10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 860
  lastpage: 867
  published: 2010-03-31 00:00:00 +0000
- title: 'State-Space Inference and Learning with Gaussian Processes'
  abstract: 'State-space inference and learning with Gaussian processes (GPs) is an unsolved problem. We propose a new, general methodology for inference and learning in nonlinear state-space models that are described probabilistically by non-parametric GP models. We apply the expectation maximization algorithm to iterate between inference in the latent state-space and learning the parameters of the underlying GP dynamics model.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/turner10a.html
  PDF: http://proceedings.mlr.press/v9/turner10a/turner10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-turner10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ryan
    family: Turner
  - given: Marc
    family: Deisenroth
  - given: Carl
    family: Rasmussen
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 868-875
  id: turner10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 868
  lastpage: 875
  published: 2010-03-31 00:00:00 +0000
- title: 'Sequential Monte Carlo Samplers for Dirichlet Process Mixtures'
  abstract: 'In this paper, we develop a novel online algorithm based on the Sequential Monte Carlo(SMC) samplers framework for posterior inference in Dirichlet Process Mixtures (DPM). Our method generalizes  many sequential importance sampling approaches. It provides a computationally efficient improvement to particle filtering that is less prone to getting stuck in isolated modes. The proposed method is a particular SMC sampler that enables us to design sophisticated clustering update schemes, such as updating past trajectories of the particles in light of recent observations, and still ensures convergence to the true DPM target distribution asymptotically.  Performance has been evaluated in a Bayesian Infinite Gaussian mixture density estimation problem and it is shown that the proposed algorithm outperforms conventional Monte Carlo approaches in terms of estimation variance and average log-marginal likelihood.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/ulker10a.html
  PDF: http://proceedings.mlr.press/v9/ulker10a/ulker10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-ulker10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yener
    family: Ulker
  - given: Bilge
    family: Günsel
  - given: Taylan
    family: Cemgil
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 876-883
  id: ulker10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 876
  lastpage: 883
  published: 2010-03-31 00:00:00 +0000
- title: 'Guarantees for Approximate Incremental SVMs'
  abstract: 'Assume a teacher provides examples one by one. An approximate incremental SVM computes a sequence of classifiers that are close to the true SVM solutions computed on the successive incremental training sets. We show that simple algorithms can satisfy an averaged accuracy criterion with a computational cost that scales as well as the best SVM algorithms with the number of examples. Finally, we exhibit some experiments highlighting the benefits of joining fast incremental optimization and curriculum and active learning (Schon and Cohn, 2000; Bordes et al., 2005; Bengio et al., 2009).'
  volume: 9
  URL: https://proceedings.mlr.press/v9/usunier10a.html
  PDF: http://proceedings.mlr.press/v9/usunier10a/usunier10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-usunier10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nicolas
    family: Usunier
  - given: Antoine
    family: Bordes
  - given: Léon
    family: Bottou
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 884-891
  id: usunier10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 884
  lastpage: 891
  published: 2010-03-31 00:00:00 +0000
- title: 'An Alternative Prior Process for Nonparametric Bayesian Clustering'
  abstract: 'Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit “rich-get-richer” characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering, the uniform process, for applications where the “rich-get-richer” property is undesirable. We also explore the cost of this new process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. Finally, we compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/wallach10a.html
  PDF: http://proceedings.mlr.press/v9/wallach10a/wallach10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-wallach10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hanna
    family: Wallach
  - given: Shane
    family: Jensen
  - given: Lee
    family: Dicker
  - given: Katherine
    family: Heller
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 892-899
  id: wallach10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 892
  lastpage: 899
  published: 2010-03-31 00:00:00 +0000
- title: 'A Potential-based Framework for Online Multi-class Learning with Partial Feedback'
  abstract: 'We study the problem of online multi-class learning with partial feedback: in each trial of online learning, instead of providing the true class label for a given instance, the oracle will only reveal to the learner if the predicted class label is correct. We present a general framework for online multi-class learning with partial feedback that adapts the potential-based gradient descent approaches (Cesa-Bianchi & Lugosi, 2006). The generality of the proposed framework is verified by the fact that Banditron (Kakade et al., 2008) is indeed a special case of our work if the potential function is set to be the squared $L_2$ norm of the weight vector. We propose an exponential gradient algorithm for online multi-class learning with partial feedback. Compared to the Banditron algorithm, the exponential gradient algorithm is advantageous in that its mistake bound is independent from the dimension of data, making it suitable for classifying high dimensional data. Our empirical study with four data sets show that the proposed algorithm for online learning with partial feedback is more effective than the Banditron algorithm.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/wang10a.html
  PDF: http://proceedings.mlr.press/v9/wang10a/wang10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-wang10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shijun
    family: Wang
  - given: Rong
    family: Jin
  - given: Hamed
    family: Valizadegan
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 900-907
  id: wang10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 900
  lastpage: 907
  published: 2010-03-31 00:00:00 +0000
- title: 'Online Passive-Aggressive Algorithms on a Budget'
  abstract: 'In this paper a kernel-based online learning algorithm, which has both constant space and update time, is proposed. The approach is based on the popular online Passive-Aggressive (PA) algorithm. When used in conjunction with kernel function, the number of support vectors in PA grows without bounds when learning from noisy data streams. This implies unlimited memory and ever increasing model update and prediction time. To address this issue, the proposed budgeted PA algorithm maintains only a fixed number of support vectors. By introducing an additional constraint to the original PA optimization problem, a closed-form solution was derived for the support vector removal and model update. Using the hinge loss we developed several budgeted PA algorithms that can trade between accuracy and update cost. We also developed the ramp loss versions of both original and budgeted PA and showed that the resulting algorithms can be interpreted as the combination of active learning and hinge loss PA. All proposed algorithms were comprehensively tested on 7 benchmark data sets. The experiments showed that they are superior to the existing budgeted online algorithms. Even with modest budgets, the budgeted PA achieved very competitive accuracies to the non-budgeted PA and kernel perceptron algorithms.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/wang10b.html
  PDF: http://proceedings.mlr.press/v9/wang10b/wang10b.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-wang10b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zhuang
    family: Wang
  - given: Slobodan
    family: Vucetic
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 908-915
  id: wang10b
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 908
  lastpage: 915
  published: 2010-03-31 00:00:00 +0000
- title: 'Structured Prediction Cascades'
  abstract: 'Structured prediction tasks pose a fundamental trade-off between the need for model complexity to increase predictive power and the limited computational resources for inference in the exponentially-sized output spaces such models require. We formulate and develop structured prediction cascades: a sequence of increasingly complex models that progressively filter the space of possible outputs. We represent an exponentially large set of filtered outputs using max marginals and propose a novel convex loss function that balances filtering error with filtering efficiency. We provide generalization bounds for these loss functions and evaluate our approach on handwriting recognition and part-of-speech tagging. We find that the learned cascades are capable of reducing the complexity of inference by up to five orders of magnitude, enabling the use of models which incorporate higher order features and yield higher accuracy.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/weiss10a.html
  PDF: http://proceedings.mlr.press/v9/weiss10a/weiss10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-weiss10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Weiss
  - given: Benjamin
    family: Taskar
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 916-923
  id: weiss10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 916
  lastpage: 923
  published: 2010-03-31 00:00:00 +0000
- title: 'Dependent Indian Buffet Processes'
  abstract: 'Latent variable models represent hidden structure in observational data.To account for the distribution of the observational data changing over time, space or some other covariate, we need generalizations of latent variable models that explicitly capture this dependency on the covariate. A variety of such generalizations has been proposed for latent variable models based on the Dirichlet process. We address dependency on covariates in binary latent feature models, by introducing a dependent Indian buffet process. The model generates, for each value of the covariate, a binary random matrix with an unbounded number of columns.  Evolution of the binary matrices over the covariate set is controlled by a hierarchical Gaussian process model. The choice of covariance functions controls the dependence structure and exchangeability properties of the model. We derive a Markov Chain Monte Carlo sampling algorithm for  Bayesian inference, and provide experiments on both synthetic and real-world data. The experimental results show that explicit modeling of dependencies significantly improves accuracy of predictions.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/williamson10a.html
  PDF: http://proceedings.mlr.press/v9/williamson10a/williamson10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-williamson10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sinead
    family: Williamson
  - given: Peter
    family: Orbanz
  - given: Zoubin
    family: Ghahramani
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 924-931
  id: williamson10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 924
  lastpage: 931
  published: 2010-03-31 00:00:00 +0000
- title: 'Modeling annotator expertise: Learning when everybody knows a bit of something'
  abstract: 'Supervised learning from multiple labeling sources is an increasingly important problem in machine learning and data mining. This paper develops a probabilistic approach to this problem when annotators may be unreliable (labels are noisy), but also their expertise varies depending on the data they observe (annotators may have knowledge about different parts of the input space). That is, an annotator may not be consistently accurate (or inaccurate) across the task domain. The presented approach produces classification and annotator models that allow us to provide estimates of the true labels and annotator variable expertise. We provide an analysis of the proposed model under various scenarios and show experimentally that annotator expertise can indeed vary in real tasks and that the presented approach provides clear advantages over previously introduced multi-annotator methods, which only consider general annotator characteristics.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/yan10a.html
  PDF: http://proceedings.mlr.press/v9/yan10a/yan10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-yan10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yan
    family: Yan
  - given: Romer
    family: Rosales
  - given: Glenn
    family: Fung
  - given: Mark
    family: Schmidt
  - given: Gerardo
    family: Hermosillo
  - given: Luca
    family: Bogoni
  - given: Linda
    family: Moy
  - given: Jennifer
    family: Dy
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 932-939
  id: yan10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 932
  lastpage: 939
  published: 2010-03-31 00:00:00 +0000
- title: 'A highly efficient blocked Gibbs sampler reconstruction of multidimensional NMR spectra'
  abstract: 'Projection Reconstruction Nuclear Magnetic Resonance (PR-NMR) is a new technique to generate multi-dimensional NMR spectra, which have discrete features that are relatively sparsely distributed in space. A small number of projections from lower dimensional NMR spectra are used to reconstruct the multi-dimensional NMR spectra. We propose an efficient algorithm which employs a blocked Gibbs sampler to accurately reconstruct NMR spectra. This statistical method generates samples in Bayesian scheme. Our proposed algorithm is tested on a set of six projections derived from the three-dimensional 700 MHz HNCO spectrum of HasA, a 187-residue heme binding protein.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/yoon10a.html
  PDF: http://proceedings.mlr.press/v9/yoon10a/yoon10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-yoon10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ji Won
    family: Yoon
  - given: Simon
    family: Wilson
  - given: K. Hun
    family: Mok
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 940-947
  id: yoon10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 940
  lastpage: 947
  published: 2010-03-31 00:00:00 +0000
- title: 'Risk Bounds for Levy Processes in the PAC-Learning Framework'
  abstract: 'Levy processes play an important role in the stochastic process theory. However, since samples are non-i.i.d., statistical learning results based on the i.i.d. scenarios cannot be utilized to study the risk bounds for Levy processes. In this paper, we present risk bounds for non-i.i.d. samples drawn from Levy processes in the PAC-learning framework. In particular, by using a concentration inequality for infinitely divisible distributions, we first prove that the function of risk error is Lipschitz continuous with a high probability, and then by using a specific concentration inequality for Levy processes, we obtain the risk bounds for non-i.i.d. samples drawn from Levy processes without Gaussian components. Based on the resulted risk bounds, we analyze the factors that affect the convergence of the risk bounds and then prove the convergence.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/zhang10a.html
  PDF: http://proceedings.mlr.press/v9/zhang10a/zhang10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chao
    family: Zhang
  - given: Dacheng
    family: Tao
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 948-955
  id: zhang10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 948
  lastpage: 955
  published: 2010-03-31 00:00:00 +0000
- title: 'Bayesian Online Learning for Multi-label and Multi-variate Performance Measures'
  abstract: 'Many real world applications employ multi-variate performance measures and each example can belong to multiple classes. The currently most popular approaches train an SVM for each class, followed by ad hoc thresholding.  Probabilistic models using Bayesian decision theory are also commonly adopted.  In this paper, we propose a Bayesian online multi-label classification framework (BOMC) which learns a probabilistic linear classifier. The likelihood is modeled by a graphical model similar to TrueSkill$^\text{TM}$, and inference is based on Gaussian density filtering with expectation propagation.  Using samples from the posterior, we label the testing data by maximizing the expected $F_1$-score. Our experiments on Reuters1-v2 dataset show BOMC compares favorably to the state-of-the-art online learners in macro-averaged $F_1$-score and training time.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/zhang10b.html
  PDF: http://proceedings.mlr.press/v9/zhang10b/zhang10b.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xinhua
    family: Zhang
  - given: Thore
    family: Graepel
  - given: Ralf
    family: Herbrich
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 956-963
  id: zhang10b
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 956
  lastpage: 963
  published: 2010-03-31 00:00:00 +0000
- title: 'Multi-Task Learning using Generalized t Process'
  abstract: 'Multi-task learning seeks to improve the generalization performance of a learning task with the help of other related learning tasks.  Among the multi-task learning methods proposed thus far, Bonilla et al.’s method provides a novel multi-task extension of Gaussian process (GP) by using a task covariance matrix to model the relationships between tasks. However, learning the task covariance matrix directly has both computational and representational drawbacks. In this paper, we propose a Bayesian extension by modeling the task covariance matrix as a random matrix with an inverse-Wishart prior and integrating it out to achieve Bayesian model averaging. To make the computation feasible, we first give an alternative weight-space view of Bonilla et al.’s multi-task GP model and then integrate out the task covariance matrix in the model, leading to a multi-task generalized t process (MTGTP). For the likelihood, we use a generalized t noise model which, together with the generalized t process prior, brings about the robustness advantage as well as an analytical form for the marginal likelihood.  In order to specify the inverse-Wishart prior, we use the maximum mean discrepancy (MMD) statistic to estimate the parameter matrix of the inverse-Wishart prior. Moreover, we investigate some theoretical properties of MTGTP, such as its asymptotic analysis and learning curve. Comparative experimental studies on two common multi-task learning applications show very promising results.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/zhang10c.html
  PDF: http://proceedings.mlr.press/v9/zhang10c/zhang10c.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yu
    family: Zhang
  - given: Dit–Yan
    family: Yeung
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 964-971
  id: zhang10c
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 964
  lastpage: 971
  published: 2010-03-31 00:00:00 +0000
- title: 'Bayesian Generalized Kernel Models'
  abstract: 'We propose a fully Bayesian approach for generalized kernel models (GKMs), which are extensions of generalized linear models in the feature space induced by a reproducing kernel. We place a mixture of a point-mass distribution and Silverman’s g-prior on the regression vector of GKMs. This mixture prior allows a fraction of the regression vector to be zero.  Thus, it serves for sparse modeling and Bayesian computation.  For inference, we exploit data augmentation methodology to develop a Markov chain Monte Carlo (MCMC) algorithm in which the reversible jump method is used for model selection and a Bayesian model averaging method is used for posterior prediction.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/zhang10d.html
  PDF: http://proceedings.mlr.press/v9/zhang10d/zhang10d.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zhihua
    family: Zhang
  - given: Guang
    family: Dai
  - given: Donghui
    family: Wang
  - given: Michael I.
    family: Jordan
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 972-979
  id: zhang10d
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 972
  lastpage: 979
  published: 2010-03-31 00:00:00 +0000
- title: 'Matrix-Variate Dirichlet Process Mixture Models'
  abstract: 'We are concerned with a multivariate response regression problem where the interest is in considering correlations both across response variates and across response samples. In this paper we develop a new Bayesian nonparametric model for such a setting based on Dirichlet process priors. Building on an additive kernel model, we allow each sample to have its own regression matrix. Although this overcomplete representation could in principle suffer from severe overfitting problems, we are able to provide effective control over the model via a matrix-variate Dirichlet process prior on the regression matrices. Our model is able to share statistical strength among regression matrices due to the clustering property of the Dirichlet process. We make use of a Markov chain Monte Carlo algorithm for inference and prediction. Compared with other Bayesian kernel models, our model has advantages in both computational and statistical efficiency.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/zhang10e.html
  PDF: http://proceedings.mlr.press/v9/zhang10e/zhang10e.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhang10e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zhihua
    family: Zhang
  - given: Guang
    family: Dai
  - given: Michael I.
    family: Jordan
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 980-987
  id: zhang10e
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 980
  lastpage: 987
  published: 2010-03-31 00:00:00 +0000
- title: 'Exclusive Lasso for Multi-task Feature Selection'
  abstract: 'We propose a novel group regularization which we call exclusive lasso. Unlike the group lasso regularizer that assumes co-varying variables in groups, the proposed exclusive lasso regularizer models the scenario when variables in the same group compete with each other. Analysis is presented to illustrate the properties of the proposed regularizer. We present a framework of kernel-based multi-task feature selection algorithm based on the proposed exclusive lasso regularizer. An efficient algorithm is derived to solve the related optimization problem. Experiments with document categorization show that our approach outperforms state-of-the-art algorithms for multi-task feature selection.'
  volume: 9
  URL: https://proceedings.mlr.press/v9/zhou10a.html
  PDF: http://proceedings.mlr.press/v9/zhou10a/zhou10a.pdf
  edit: https://github.com/mlresearch//v9/edit/gh-pages/_posts/2010-03-31-zhou10a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yang
    family: Zhou
  - given: Rong
    family: Jin
  - given: Steven Chu–Hong
    family: Hoi
  editor: 
  - given: Yee Whye
    family: Teh
  - given: Mike
    family: Titterington
  address: Chia Laguna Resort, Sardinia, Italy
  page: 988-995
  id: zhou10a
  issued:
    date-parts: 
      - 2010
      - 3
      - 31
  firstpage: 988
  lastpage: 995
  published: 2010-03-31 00:00:00 +0000