- title: 'Algorithmic Learning Theory (ALT) 2017: Preface'
  volume: 76
  URL: https://proceedings.mlr.press/v76/hanneke17a.html
  PDF: http://proceedings.mlr.press/v76/hanneke17a/hanneke17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-hanneke17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 1-2
  id: hanneke17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 1
  lastpage: 2
  published: 2017-10-11 00:00:00 +0000
- title: 'New bounds on the price of bandit feedback for mistake-bounded online multiclass learning'
  abstract: 'This paper is about two generalizations of the mistake bound model to online multiclass classification. In the <i>standard model</i>, the learner receives the correct classification at the end of each round, and in the <i>bandit model</i>, the learner only finds out whether its prediction was correct or not.  For a set $F$ of multiclass classifiers, let $\mathrm{opt}_{\mathrm{std}}(F)$ and $\mathrm{opt}_{\mathrm{bandit}}(F)$ be the optimal bounds for learning $F$ according to these two models.  We show that an $$ \mathrm{opt}_{\mathrm{bandit}}(F) \leq (1 + o(1)) (|Y| \ln |Y|) \mathrm{opt}_{\mathrm{std}}(F) $$ bound is the best possible up to the leading constant, closing a $\Theta(\log |Y|)$ factor gap.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/long17a.html
  PDF: http://proceedings.mlr.press/v76/long17a/long17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-long17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Philip M.
    family: Long
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 3-10
  id: long17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 3
  lastpage: 10
  published: 2017-10-11 00:00:00 +0000
- title: 'Minimax rates for cost-sensitive learning on manifolds with approximate nearest neighbours'
  abstract: 'We study the approximate nearest neighbour method for cost-sensitive classification on low-dimensional manifolds embedded within a high-dimensional feature space. We determine the minimax learning rates for distributions on a smooth manifold, in a cost-sensitive setting. This generalises a classic result of Audibert and Tsybakov. Building upon recent work of Chaudhuri and Dasgupta we prove that these minimax rates are attained by the approximate nearest neighbour algorithm, where neighbours are computed in a randomly projected low-dimensional space. In addition, we give a bound on the number of dimensions required for the projection which depends solely upon the <i>reach</i> and dimension of the manifold, combined with the regularity of the marginal.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/reeve17a.html
  PDF: http://proceedings.mlr.press/v76/reeve17a/reeve17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-reeve17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Henry W. J.
    family: Reeve
  - given: Gavin
    family: Brown
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 11-56
  id: reeve17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 11
  lastpage: 56
  published: 2017-10-11 00:00:00 +0000
- title: 'Universality of Bayesian mixture predictors'
  abstract: 'The problem is that of sequential probability forecasting for discrete-valued time series. The data is generated by an unknown  probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set $\mathcal{C}$, but the latter is completely arbitrary (uncountably infinite, without any structure given).  The performance is measured by  asymptotic average log loss.  In this work it is shown that the minimax asymptotic performance  is always attainable, and it is attained by a Bayesian mixture over countably many measures from the set $\mathcal{C}$. This was previously only known for the case when the best achievable asymptotic error is 0. The new result can  be interpreted  as a complete-class theorem for prediction. It also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal. This leads to a very general conclusion concerning model selection for a problem of sequential inference: it is better to take a model large enough to make sure it includes the process that generates the data, even if it entails positive asymptotic average loss, for otherwise any combination of predictors in the model class may be useless.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/ryabko17a.html
  PDF: http://proceedings.mlr.press/v76/ryabko17a/ryabko17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-ryabko17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Daniil
    family: Ryabko
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 57-71
  id: ryabko17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 57
  lastpage: 71
  published: 2017-10-11 00:00:00 +0000
- title: 'Erasing Pattern Languages Distinguishable by a Finite Number of Strings'
  abstract: 'Pattern languages have been an object of study in various subfields of computer science for decades. This paper introduces and studies a decision problem on patterns called the <i>finite distinguishability</i> problem: given a pattern $\pi$, are there finite sets $T^+$ and $T^-$ of strings such that the only pattern language containing all strings in $T^+$ and none of the strings in $T^-$ is the language generated by $\pi$? This problem is related to the complexity of teacher-directed learning, as studied in computational learning theory, as well as to the long-standing open question whether the equivalence of two patterns is decidable. We show that finite distinguishability is decidable if the underlying alphabet is of size other than $2$ or $3$, and provide a number of related results, such as (i) partial solutions for alphabet sizes $2$ and $3$, and (ii) decidability proofs for variants of the problem for special subclasses of patterns, namely, regular, 1-variable, and non-cross patterns. For the same subclasses, we further determine the values of two complexity parameters in teacher-directed learning, namely the <i>teaching dimension</i> and the <i>recursive teaching dimension</i>.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/bayeh17a.html
  PDF: http://proceedings.mlr.press/v76/bayeh17a/bayeh17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-bayeh17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Fahimeh
    family: Bayeh
  - given: Ziyuan
    family: Gao
  - given: Sandra
    family: Zilles
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 72-108
  id: bayeh17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 72
  lastpage: 108
  published: 2017-10-11 00:00:00 +0000
- title: 'Non-Adaptive Randomized Algorithm for Group Testing'
  abstract: 'We study the problem of group testing with a non-adaptive randomized algorithm in the random incidence design (RID) model where each entry in the test is chosen randomly independently from $\{0,1\}$ with a fixed probability $p$. <br><br> The property that is sufficient and necessary for a unique decoding is the separability of the tests, but unfortunately no linear time algorithm is known for such tests. In order to achieve linear-time decodable tests, the algorithms in the literature use the disjunction property that gives almost optimal number of tests. <br><br> We define a new property for the tests which we call semi-disjunction property. We show that there is a linear time decoding for such test and for $d\to \infty$ the number of tests converges to the number of tests with the separability property. Our analysis shows that, in the RID model, the number of tests in our algorithm is better than the one with the disjunction property even for small $d$.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/bshouty17a.html
  PDF: http://proceedings.mlr.press/v76/bshouty17a/bshouty17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-bshouty17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Nader H.
    family: Bshouty
  - given: Nuha
    family: Diab
  - given: Shada R.
    family: Kawar
  - given: Robert J.
    family: Shahla
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 109-128
  id: bshouty17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 109
  lastpage: 128
  published: 2017-10-11 00:00:00 +0000
- title: 'Automatic Learning from Repetitive Texts'
  abstract: 'We study the connections between the learnability of automatic families of languages and the types of text used to present them to a learner. More precisely, we study how restrictions on the number of times that a correct datum appears in a text influence what classes of languages are automatically learnable. We show that an automatic family of languages is automatically learnable from fat text iff it is automatically learnable from thick text iff it is verifiable from balanced text iff it satisfies Angluin''s tell-tale condition. Furthermore, many automatic families are automatically learnable from exponential text. We also study the relationship between automatic learnability and verifiability and show that all automatic families are automatically partially verifiable from exponential text and automatically learnable from thick text.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/h%C3%B6lzl17a.html
  PDF: http://proceedings.mlr.press/v76/hölzl17a/hölzl17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-hölzl17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Rupert
    family: Hölzl
  - given: Sanjay
    family: Jain
  - given: Philipp
    family: Schlicht
  - given: Karen
    family: Seidel
  - given: Frank
    family: Stephan
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 129-150
  id: hölzl17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 129
  lastpage: 150
  published: 2017-10-11 00:00:00 +0000
- title: 'Boundary Crossing for General Exponential Families'
  abstract: 'We consider parametric exponential families of dimension $K$ on the real line. We study a variant of <i>boundary crossing probabilities</i> coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension $K$. Formally, our result is a concentration inequality that bounds the probability that $\mathcal{B}^\psi(\hat \theta_n,\theta^\star)\geq f(t/n)/n$, where $\theta^\star$ is the parameter of an unknown target distribution, $\hat \theta_n$ is the empirical parameter estimate built from $n$ observations, $\psi$ is the log-partition function of the exponential family and $\mathcal{B}^\psi$ is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function $f$ is logarithmic, as it enables to analyze the regret of the state-of-the-art KL-ucb and KL-ucb+ strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when $K=1$, while we provide results for arbitrary finite dimension $K$, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results  already  existed three decades ago in the work of T.L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/maillard17a.html
  PDF: http://proceedings.mlr.press/v76/maillard17a/maillard17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-maillard17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Odalric-Ambrym
    family: Maillard
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 151-184
  id: maillard17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 151
  lastpage: 184
  published: 2017-10-11 00:00:00 +0000
- title: 'Preference-based Teaching of Unions of Geometric Objects'
  abstract: 'This paper studies exact learning of unions of non-discretized geometric concepts in the model of preference-based teaching. In particular, it focuses on upper and lower bounds of the corresponding sample complexity parameter, the preference-based teaching dimension (PBTD), when learning disjoint unions of a bounded number of geometric concepts of various types -- for instance balls, axis-aligned cubes, or axis-aligned boxes -- in arbitrary dimensions. It is shown that the PBTD of disjoint unions of some such types of concepts grows linearly with the number of concepts in the union, independent of the dimensionality. Teaching the union of potentially overlapping objects turns out to be more involved and is hence considered here only for unions of up to two objects.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/gao17a.html
  PDF: http://proceedings.mlr.press/v76/gao17a/gao17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-gao17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Ziyuan
    family: Gao
  - given: David
    family: Kirkpatrick
  - given: Christoph
    family: Ries
  - given: Hans
    family: Simon
  - given: Sandra
    family: Zilles
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 185-207
  id: gao17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 185
  lastpage: 207
  published: 2017-10-11 00:00:00 +0000
- title: 'Specifying a positive threshold function via extremal points'
  abstract: 'An extremal point of a positive threshold Boolean function $f$ is either a maximal zero or a minimal one. It is known that if $f$ depends on all its variables, then the set of its extremal points completely specifies $f$ within the universe of threshold functions. However, in some cases, $f$ can be specified by a smaller set. The minimum number of points in such a set is the specification number of $f$. Hu (1965) showed that the specification number of a threshold function of $n$ variables is at least $n+1$.  Anthony et al. (1995) proved that this bound is attained for nested functions and conjectured that for all other threshold functions the specification number is strictly greater than $n+1$. In the present paper, we resolve this conjecture negatively by exhibiting threshold Boolean functions of $n$ variables, which are non-nested and for which the specification number is $n+1$. On the other hand, we show that the set of extremal points satisfies the statement of the conjecture, i.e.~a positive threshold Boolean function depending on all its $n$ variables has $n+1$ extremal points if and only if it is nested. To prove this, we reveal an underlying structure of the set of extremal points.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/lozin17a.html
  PDF: http://proceedings.mlr.press/v76/lozin17a/lozin17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-lozin17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Vadim
    family: Lozin
  - given: Igor
    family: Razgon
  - given: Viktor
    family: Zamaraev
  - given: Elena
    family: Zamaraeva
  - given: Nikolai Yu.
    family: Zolotykh
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 208-222
  id: lozin17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 208
  lastpage: 222
  published: 2017-10-11 00:00:00 +0000
- title: 'A minimax and asymptotically optimal algorithm for stochastic bandits'
  abstract: 'We propose the $\text{kl-UCB}^{++}$ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins'' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/m%C3%A9nard17a.html
  PDF: http://proceedings.mlr.press/v76/ménard17a/ménard17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-ménard17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Pierre
    family: Ménard
  - given: Aurélien
    family: Garivier
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 223-237
  id: ménard17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 223
  lastpage: 237
  published: 2017-10-11 00:00:00 +0000
- title: 'Graph Verification with a Betweenness Oracle'
  abstract: 'In this paper, we examine the query complexity of verifying a hidden graph $G$ with a betweenness oracle. Let $G=(V,E)$ be a hidden graph and $\hat{G}=(V,\hat{E})$ be a known graph. $V$ and $\hat{E}$ are known and $E$ is not known. The graphs are connected, unweighted and have bounded maximum degree $\Delta$. The task of the graph verification problem is to verify that $E=\hat{E}$. We have access to $G$ through a black-box betweenness oracle. A betweenness oracle returns whether a vertex lies along a shortest path between two other vertices. The betweenness oracle nicely captures many real-world problems. We prove that graph verification can be done using $n^{1+o(1)}$ betweenness queries. Surprisingly, this matches the state of the art for the graph verification problem with the much stronger distance oracle. We also prove that graph verification requires $\Omega(n)$ betweenness queries -- a matching lower bound.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/janardhanan17a.html
  PDF: http://proceedings.mlr.press/v76/janardhanan17a/janardhanan17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-janardhanan17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Mano Vikash
    family: Janardhanan
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 238-249
  id: janardhanan17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 238
  lastpage: 249
  published: 2017-10-11 00:00:00 +0000
- title: 'Lifelong Learning in Costly Feature Spaces'
  abstract: 'An important long-term goal in machine learning systems is to build learning agents that, like humans, can learn many tasks over their lifetime, and moreover use information from these tasks to improve their ability to do so efficiently. In this work, our goal is to provide new theoretical insights into the potential of this paradigm. In particular, we propose a lifelong learning framework that adheres to a novel notion of resource efficiency that is critical in many real-world domains where feature evaluations are costly. That is, our learner aims to reuse information from previously learned related tasks to learn future tasks in a <i>feature-efficient</i> manner. Furthermore, we consider novel combinatorial ways in which learning tasks can relate.  Specifically, we design lifelong learning algorithms for two structurally different and widely used families of target functions: decision trees/lists and monomials/polynomials. We also provide strong feature-efficiency guarantees for these algorithms; in fact, we show that in order to learn future targets, we need only slightly more feature evaluations per training example than what is needed to predict on an arbitrary example using those targets. We also provide algorithms with guarantees in an agnostic model where not all the targets are related to each other. Finally, we also provide lower bounds on the performance of a lifelong learner in these models, which are in fact tight under some conditions.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/balcan17a.html
  PDF: http://proceedings.mlr.press/v76/balcan17a/balcan17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-balcan17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Maria-Florina
    family: Balcan
  - given: Avrim
    family: Blum
  - given: Vaishnavh
    family: Nagarajan
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 250-287
  id: balcan17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 250
  lastpage: 287
  published: 2017-10-11 00:00:00 +0000
- title: 'Collaborative Clustering: Sample Complexity and Efficient Algorithms'
  abstract: 'We study the problem of <i>collaborative clustering</i>. This problem is concerned with a set of items grouped into clusters that we wish to recover from ratings provided by users. The latter are also clustered, and each user rates a random but typical small number of items. The observed ratings are random variables whose distributions depend on the item and user clusters only. Unlike for collaborative filtering problems where one needs to recover both user and item clusters, here we only wish to classify items. The number of items rated by a user can be so small that anyway, estimating user clusters may be hopeless. For the collaborative clustering problem, we derive fundamental performance limits satisfied by any algorithm. Specifically, we identify the number of ratings needed to guarantee the existence of an algorithm recovering the clusters with a prescribed level of accuracy. We also propose SplitSpec, an algorithm whose performance matches these fundamental performance limit order-wise. In turn, SplitSpec is able to exploit, as much as this is possible, the users&#8217 structure to improve the item cluster estimates.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/ok17a.html
  PDF: http://proceedings.mlr.press/v76/ok17a/ok17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-ok17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Jungseul
    family: Ok
  - given: Se-Young
    family: Yun
  - given: Alexandre
    family: Proutiere
  - given: Rami
    family: Mochaourab
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 288-329
  id: ok17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 288
  lastpage: 329
  published: 2017-10-11 00:00:00 +0000
- title: 'Parameter identification in Markov chain choice models'
  abstract: 'This work studies the parameter identification problem for the Markov chain choice model of Blanchet, Gallego, and Goyal used in assortment planning.  In this model, the product selected by a customer is determined by a Markov chain over the products, where the products in the offered assortment are absorbing states.  The underlying parameters of the model were previously shown to be identifiable from the choice probabilities for the all-products assortment, together with choice probabilities for assortments of all-but-one products.  Obtaining and estimating choice probabilities for such large assortments is not desirable in many settings.  The main result of this work is that the parameters may be identified from assortments of sizes two and three, regardless of the total number of products.  The result is obtained via a simple and efficient parameter recovery algorithm.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/gupta17a.html
  PDF: http://proceedings.mlr.press/v76/gupta17a/gupta17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-gupta17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Arushi
    family: Gupta
  - given: Daniel
    family: Hsu
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 330-340
  id: gupta17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 330
  lastpage: 340
  published: 2017-10-11 00:00:00 +0000
- title: 'The Complexity of Explaining Neural Networks Through (group) Invariants'
  abstract: 'Ever since the work of Minsky and Papert, it has been thought that neural networks derive their effectiveness by finding representations of the data that are invariant with respect to the task. In other words, the representations eliminate components of the data that vary in a way that is irrelevant. These invariants are naturally expressed with respect to group operations, and thus an understanding of these groups is key to explaining the effectiveness of the neural network. Moreover, a line of work in deep learning has shown that explicit knowledge of group invariants can lead to more effective training results. <br><br> In this paper, we investigate the difficulty of discovering anything about these implicit invariants. Unfortunately, our main results are negative: we show that a variety of questions around investigating invariant representations are NP-hard, even in approximate settings. Moreover, these results do not depend on the kind of architecture used: in fact, our results follow as soon as the network architecture is powerful enough to be universal. The key idea behind our results is that if we can find the symmetries of a problem then we can solve it.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/ensign17a.html
  PDF: http://proceedings.mlr.press/v76/ensign17a/ensign17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-ensign17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Danielle
    family: Ensign
  - given: Scott
    family: Neville
  - given: Arnab
    family: Paul
  - given: Suresh
    family: Venkatasubramanian
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 341-359
  id: ensign17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 341
  lastpage: 359
  published: 2017-10-11 00:00:00 +0000
- title: 'An efficient query learning algorithm for zero-suppressed binary decision diagrams'
  abstract: 'A ZDD is a directed acyclic graph that represents a family of sets over a fixed universe set. In this paper, we propose an algorithm that learns zero-suppressed binary decision diagrams (ZDDs) using membership and equivalence queries. If the target ZDD has $n$ nodes and the cardinality of the universe is $m$, our algorithm uses $n$ equivalence queries and at most $n(\lfloor \log m \rfloor + 4n)$ membership queries to learn the target ZDD.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/mizumoto17a.html
  PDF: http://proceedings.mlr.press/v76/mizumoto17a/mizumoto17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-mizumoto17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Hayato
    family: Mizumoto
  - given: Shota
    family: Todoroki
  - given: 
    family: Diptarama
  - given: Ryo
    family: Yoshinaka
  - given: Ayumi
    family: Shinohara
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 360-371
  id: mizumoto17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 360
  lastpage: 371
  published: 2017-10-11 00:00:00 +0000
- title: 'Soft-Bayes: Prod for Mixtures of Experts with Log-Loss'
  abstract: 'We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms. We argue that existing algorithms such as exponentiated gradient, online gradient descent and online Newton step do not adequately satisfy both requirements. Our main contribution is an analysis of the Prod algorithm that is robust to any data sequence and runs in linear time relative to the number of experts in each round. Despite the unbounded nature of the log-loss, we derive a bound that is independent of the largest loss and of the largest gradient, and depends only on the number of experts and the time horizon. Furthermore we give a Bayesian interpretation of Prod and adapt the algorithm to derive a tracking regret.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/orseau17a.html
  PDF: http://proceedings.mlr.press/v76/orseau17a/orseau17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-orseau17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Laurent
    family: Orseau
  - given: Tor
    family: Lattimore
  - given: Shane
    family: Legg
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 372-399
  id: orseau17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 372
  lastpage: 399
  published: 2017-10-11 00:00:00 +0000
- title: 'Hypotheses testing on infinite random graphs'
  abstract: 'Drawing on some recent results that provide the  formalism necessary to definite stationarity for infinite random graphs, this paper  initiates the study of statistical and learning questions pertaining to these objects. Specifically, a criterion for the existence of a  consistent test for complex hypotheses is presented, generalizing the corresponding results on time series. As an application, it is shown how one can test that a tree has the Markov property, or, more generally, to estimate its memory.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/ryabko17b.html
  PDF: http://proceedings.mlr.press/v76/ryabko17b/ryabko17b.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-ryabko17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Daniil
    family: Ryabko
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 400-411
  id: ryabko17b
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 400
  lastpage: 411
  published: 2017-10-11 00:00:00 +0000
- title: 'Scale-Invariant Unconstrained Online Learning'
  abstract: 'We consider a variant of online convex optimization in which both the instances (input vectors) and the comparator (weight vector) are unconstrained. We exploit a natural scale invariance symmetry in our unconstrained setting: the predictions of the optimal comparator are invariant under any linear transformation of the instances.  Our goal is to design online algorithms which also enjoy this property, i.e. are scale-invariant. We start with the case of coordinate-wise invariance, in which the individual coordinates (features) can be arbitrarily rescaled.  We give an algorithm, which achieves essentially optimal regret bound in this setup, expressed by means of a coordinate-wise scale-invariant norm of the comparator. We then study general invariance with respect to arbitrary linear transformations.  We first give a negative result, showing that no algorithm can achieve a meaningful bound in terms of scale-invariant norm of the comparator in the worst case. Next, we compliment this result with a positive one, providing an algorithm which "almost" achieves the desired bound, incurring only a logarithmic overhead in terms of the norm of the instances.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/kot%C5%82owski17a.html
  PDF: http://proceedings.mlr.press/v76/kotłowski17a/kotłowski17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-kotłowski17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Wojciech
    family: Kotłowski
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 412-433
  id: kotłowski17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 412
  lastpage: 433
  published: 2017-10-11 00:00:00 +0000
- title: 'Learning MSO-definable hypotheses on strings'
  abstract: 'We study the classification problems over string data for hypotheses specified by formulas of monadic second-order logic MSO. The goal is to design learning algorithms that run in time polynomial in the size of the training set, independently of or at least sublinear in the size of the whole data set. We prove negative as well as positive results. If the data set is an unprocessed string to which our algorithms have local access, then learning in sublinear time is impossible even for hypotheses definable in a small fragment of first-order logic. If we allow for a linear time pre-processing of the string data to build an index data structure, then learning of MSO-definable hypotheses is possible in time polynomial in the size of the training set, independently of the size of the whole data set.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/grohe17a.html
  PDF: http://proceedings.mlr.press/v76/grohe17a/grohe17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-grohe17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Martin
    family: Grohe
  - given: Christof
    family: Löding
  - given: Martin
    family: Ritzert
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 434-451
  id: grohe17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 434
  lastpage: 451
  published: 2017-10-11 00:00:00 +0000
- title: 'The Power of Random Counterexamples'
  abstract: 'Learning a target concept from a finite $n \times m$ concept space requires $\Omega{(n)}$ proper equivalence queries in the worst case. We propose a variation of the usual equivalence query in which the teacher is constrained to choose counterexamples randomly from a known probability distribution on examples. We present and analyze the Max-Min learning algorithm, which identifies an arbitrary target concept in an arbitrary finite $n \times m$ concept space using at most an expected $\log_2{n}$ proper equivalence queries with random counterexamples.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/angluin17a.html
  PDF: http://proceedings.mlr.press/v76/angluin17a/angluin17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-angluin17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Dana
    family: Angluin
  - given: Tyler
    family: Dohrn
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 452-465
  id: angluin17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 452
  lastpage: 465
  published: 2017-10-11 00:00:00 +0000
- title: 'A Strongly Quasiconvex PAC-Bayesian Bound'
  abstract: 'We propose a new PAC-Bayesian bound and a way of constructing a hypothesis space, so that the bound is convex in the posterior distribution and also convex in a trade-off parameter between empirical performance of the posterior distribution and its complexity. The complexity is measured by the Kullback-Leibler divergence to a prior. We derive an alternating procedure for minimizing the bound. We show that the bound can be rewritten as a one-dimensional function of the trade-off parameter and provide sufficient conditions under which the function has a single global minimum. When the conditions are satisfied the alternating minimization is guaranteed to converge to the global minimum of the bound. We provide experimental results demonstrating that rigorous minimization of the bound is competitive with cross-validation in tuning the trade-off between complexity and empirical performance. In all our experiments the trade-off turned to be quasiconvex even when the sufficient conditions were violated.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/thiemann17a.html
  PDF: http://proceedings.mlr.press/v76/thiemann17a/thiemann17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-thiemann17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Niklas
    family: Thiemann
  - given: Christian
    family: Igel
  - given: Olivier
    family: Wintenberger
  - given: Yevgeny
    family: Seldin
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 466-492
  id: thiemann17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 466
  lastpage: 492
  published: 2017-10-11 00:00:00 +0000
- title: 'Normal Forms in Semantic Language Identification'
  abstract: 'We consider language learning in the limit from text where all learning restrictions are semantic, that is, where any conjecture may be replaced by a semantically equivalent conjecture. For different such learning criteria, starting with the well-known $\mathbf{Txt}\mathbf{G}\mathbf{Bc}$-learning, we consider three different normal forms: strongly locking learning, consistent learning and (partially) set-driven learning. These normal forms support and simplify proofs and give insight into what behaviors are necessary for successful learning (for example when consistency in conservative learning implies cautiousness and strong decisiveness). <br><br>  We show that strongly locking learning can be assumed for partially set-driven learners, even when learning restrictions apply. We give a very general proof relying only on a natural property of the learning restriction, namely, allowing for simulation on equivalent text. Furthermore, when no restrictions apply, also the converse is true: every strongly locking learner can be made partially set-driven. For several semantic learning criteria we show that learning can be done consistently. Finally, we deduce for which learning restrictions partial set-drivenness and set-drivenness coincide, including a general statement about classes of infinite languages. 	The latter again relies on a simulation argument.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/k%C3%B6tzing17a.html
  PDF: http://proceedings.mlr.press/v76/kötzing17a/kötzing17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-kötzing17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Timo
    family: Kötzing
  - given: Martin
    family: Schirneck
  - given: Karen
    family: Seidel
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 493-516
  id: kötzing17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 493
  lastpage: 516
  published: 2017-10-11 00:00:00 +0000
- title: 'Efficient tracking of a growing number of experts'
  abstract: 'We consider a variation on the problem of prediction with expert advice, where new forecasters that were unknown until then may appear at each round. As often in prediction with expert advice, designing an algorithm that achieves near-optimal regret guarantees is straightforward, using aggregation of experts. However, when the comparison class is sufficiently rich, for instance when the best expert and the set of experts itself changes over time, such strategies naively require to maintain a prohibitive number of weights (typically exponential with the time horizon). By contrast, designing strategies that both achieve a near-optimal regret and maintain a reasonable number of weights is highly non-trivial. We consider three increasingly challenging objectives (simple regret, shifting regret and sparse shifting regret) that extend existing notions defined for a fixed expert ensemble; in each case, we design strategies that achieve tight regret bounds, adaptive to the parameters of the comparison class, while being computationally inexpensive. Moreover, our algorithms are anytime, agnostic to the number of incoming experts and completely parameter-free. Such remarkable results are made possible thanks to two simple but highly effective recipes: first the "abstention trick" that comes from the <i>specialist</i> framework and enables to handle the least challenging notions of regret, but is limited when addressing more sophisticated objectives. Second, the "muting trick" that we introduce to give more flexibility. We show how to combine these two tricks in order to handle the most challenging class of comparison strategies.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/mourtada17a.html
  PDF: http://proceedings.mlr.press/v76/mourtada17a/mourtada17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-mourtada17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Jaouad
    family: Mourtada
  - given: Odalric-Ambrym
    family: Maillard
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 517-539
  id: mourtada17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 517
  lastpage: 539
  published: 2017-10-11 00:00:00 +0000
- title: 'Tight Bounds on $\ell_1$ Approximation and Learning of Self-Bounding Functions'
  abstract: 'We study the complexity of learning and approximation of self-bounding functions over the uniform distribution on the Boolean hypercube $\{0,1\}^n$. Informally, a function $f:\{0,1\}^n \rightarrow \mathbb{R}$ is self-bounding if for every $x \in \{0,1\}^n$, $f(x)$ upper bounds the sum of all the $n$ marginal decreases in the value of the function at $x$. Self-bounding functions include such well-known classes of functions as submodular and fractionally-subadditive (XOS) functions. They were introduced by Boucheron et al. in the context of concentration of measure inequalities. Our main result is a  nearly tight $\ell_1$-approximation of self-bounding functions by low-degree juntas. Specifically, all self-bounding functions can be $\epsilon$-approximated in $\ell_1$ by a polynomial of degree $\tilde{O}(1/\epsilon)$ over $2^{\tilde{O}(1/\epsilon)}$ variables. We show that both the degree and junta-size are optimal up to logarithmic terms. Previous techniques considered stronger $\ell_2$ approximation and proved nearly tight bounds of  $\Theta(1/\epsilon^{2})$ on the degree and $2^{\Theta(1/\epsilon^2)}$ on the number of variables. Our bounds rely on the analysis of noise stability of self-bounding functions together with a stronger connection between noise stability and $\ell_1$ approximation by low-degree polynomials. This technique can also be used to get tighter bounds on $\ell_1$ approximation by low-degree polynomials and faster learning algorithm for halfspaces. \newline These results lead to improved and in several cases almost tight bounds for PAC and agnostic learning of self-bounding functions relative to the uniform distribution. In particular, assuming hardness of learning juntas, we show that PAC and agnostic learning of self-bounding functions have complexity of $n^{\tilde{\Theta}(1/\epsilon)}$.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/feldman17a.html
  PDF: http://proceedings.mlr.press/v76/feldman17a/feldman17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-feldman17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Vitaly
    family: Feldman
  - given: Pravesh
    family: Kothari
  - given: Jan
    family: Vondrák
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 540-559
  id: feldman17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 540
  lastpage: 559
  published: 2017-10-11 00:00:00 +0000
- title: 'Relative Error Embeddings of the Gaussian Kernel Distance'
  abstract: 'A reproducing kernel defines an embedding of a data point into an infinite dimensional reproducing kernel Hilbert space (RKHS).  The norm in this space describes a distance, which we call the kernel distance.  The random Fourier features (of Rahimi and Recht) describe an oblivious approximate mapping into finite dimensional Euclidean space that behaves similar to the RKHS.  We show in this paper that for the Gaussian kernel the Euclidean norm between these mapped to features has $(1+\varepsilon)$-relative error with respect to the kernel distance.  When there are $n$ data points, we show that $O((1/\varepsilon^2) \log n)$ dimensions of the approximate feature space are sufficient and necessary. Without a bound on $n$, but when the original points lie in $\mathbb{R}^d$ and have diameter bounded by $\mathcal{M}$, then we show that $O((d/\varepsilon^2) \log \mathcal{M})$ dimensions are sufficient, and that this many are required, up to $\log(1/\varepsilon)$ factors.	We empirically confirm that relative error is indeed preserved for kernel PCA using these approximate feature maps.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/chen17a.html
  PDF: http://proceedings.mlr.press/v76/chen17a/chen17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-chen17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Di
    family: Chen
  - given: Jeff M.
    family: Phillips
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 560-576
  id: chen17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 560
  lastpage: 576
  published: 2017-10-11 00:00:00 +0000
- title: 'Adaptive Submodularity with Varying Query Sets: An Application to Active Multi-label Learning'
  abstract: 'Adaptive submodular optimization, where a sequence of items is selected adaptively to optimize a submodular function, has been found to have many applications from sensor placement to active learning. In the current paper, we extend this work to the setting of multiple queries at each time step, where the set of available queries is randomly constrained. A primary contribution of this paper is to prove the first near optimal approximation bound for a greedy policy in this setting. A natural application of this framework is to crowd-sourced active learning problem where the set of available experts and examples might vary randomly. We instantiate the new framework for multi-label learning and evaluate it in multiple benchmark domains with promising results.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/fern17a.html
  PDF: http://proceedings.mlr.press/v76/fern17a/fern17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-fern17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Alan
    family: Fern
  - given: Robby
    family: Goetschalckx
  - given: Mandana
    family: Hamidi-Haines
  - given: Prasad
    family: Tadepalli
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 577-592
  id: fern17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 577
  lastpage: 592
  published: 2017-10-11 00:00:00 +0000
- title: 'Structured Best Arm Identification with Fixed Confidence'
  abstract: 'We study the problem of identifying the best action among a set of possible options when the value of each action is given by a mapping from a number of noisy micro-observables in the so-called fixed confidence setting. Our main motivation is the application to minimax game search, which has been a major topic of interest in artificial intelligence. In this paper we introduce an abstract setting to clearly describe the essential properties of the problem. While previous work only considered a two-move-deep game tree search problem, our abstract setting can be applied to the general minimax games where the depth can be non-uniform and arbitrary, and transpositions are allowed. We introduce a new algorithm (LUCB-micro) for the abstract setting, and give its lower and upper sample complexity results. Our bounds recover some previous results, achieved in more limited settings, and also shed further light on how the structure of minimax problems influences sample complexity.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/huang17a.html
  PDF: http://proceedings.mlr.press/v76/huang17a/huang17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-huang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Ruitong
    family: Huang
  - given: Mohammad M.
    family: Ajallooeian
  - given: Csaba
    family: Szepesvári
  - given: Martin
    family: Müller
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 593-616
  id: huang17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 593
  lastpage: 616
  published: 2017-10-11 00:00:00 +0000
- title: 'On Compressive Ensemble Induced Regularisation: How Close is the Finite Ensemble Precision Matrix to the Infinite Ensemble?'
  abstract: 'Averaging ensembles of randomly oriented low-dimensional projections of a singular covariance represent a novel and attractive means to obtain a well-conditioned inverse, which only needs access to random projections of the data. However, theoretical analyses so far have only been done at convergence, implying good properties for `large-enough'' ensembles. But how large is `large enough''? Here we bound the expected difference in spectral norm between the finite ensemble precision matrix and the infinite ensemble, and based on this we give an estimate of the required ensemble size to guarantee the approximation error of the finite ensemble is below a given tolerance. Under mild assumptions, we find that for any given tolerance, the ensemble only needs to grow linearly in the original data dimension. A technical ingredient of our analysis is to upper bound the spectral norm of a matrix-variate T,  which we then  employ in conjunction with specific results from random matrix theory regarding the estimation of the covariance of random matrices.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/kab%C3%A1n17a.html
  PDF: http://proceedings.mlr.press/v76/kabán17a/kabán17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-kabán17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Ata
    family: Kabán
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 617-628
  id: kabán17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 617
  lastpage: 628
  published: 2017-10-11 00:00:00 +0000
- title: 'Dealing with Range Anxiety in Mean Estimation via Statistical Queries'
  abstract: 'We give algorithms for estimating the expectation of a given real-valued function $\phi:X\to \mathbb{R}$ on a sample drawn randomly from some unknown distribution $D$ over domain $X$, namely $\mathbf{E}_{\mathbf{x}\sim D}[\phi(\mathbf{x})]$. Our algorithms work in two well-studied models of restricted access to data samples. The first one is the statistical query (SQ) model in which an algorithm has access to an <i>SQ oracle</i> for the input distribution $D$ over $X$ instead of i.i.d. samples from $D$. Given a query function $\phi:X \to [0,1]$, the oracle returns an estimate of $\mathbf{E}_{\mathbf{x}\sim D}[\phi(\mathbf{x})]$ within some tolerance $\tau$. The second, is a  model in which only a single bit is communicated from each sample. In both of these models the error obtained using a naive implementation would scale polynomially with the range of the random variable $\phi(\mathbf{x})$ (which might even be infinite). In contrast, without restrictions on access to data the expected error scales with the standard deviation of $\phi(\mathbf{x})$. Here we give a simple algorithm whose error scales linearly in standard deviation of $\phi(\mathbf{x})$ and logarithmically with an upper bound on the second moment of $\phi(\mathbf{x})$. <br><br> As corollaries, we obtain algorithms for high dimensional mean estimation and stochastic convex optimization in these models that work in more general settings than previously known solutions.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/feldman17b.html
  PDF: http://proceedings.mlr.press/v76/feldman17b/feldman17b.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-feldman17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Vitaly
    family: Feldman
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 629-640
  id: feldman17b
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 629
  lastpage: 640
  published: 2017-10-11 00:00:00 +0000
- title: 'Learning from Networked Examples'
  abstract: 'Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may share some common objects, and hence share the features of these shared objects. We show that the classic approach of ignoring this problem potentially can have a harmful effect on the accuracy of statistics, and then consider alternatives. One of these is to only use independent examples, discarding other information. However, this is clearly suboptimal. We analyze sample error bounds in this networked setting, providing significantly improved results. An important component of our approach is formed by efficient sample weighting schemes, which leads to novel concentration inequalities.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/wang17a.html
  PDF: http://proceedings.mlr.press/v76/wang17a/wang17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-wang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Yuyi
    family: Wang
  - given: Zheng-Chu
    family: Guo
  - given: Jan
    family: Ramon
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 641-666
  id: wang17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 641
  lastpage: 666
  published: 2017-10-11 00:00:00 +0000
- title: 'PAC Learning Depth-3 $\textrm{AC}^0$ Circuits of Bounded Top Fanin'
  abstract: 'An important and long-standing question in computational learning theory is how to learn $\textrm{AC}^0$ circuits with respect to any distribution (i.e. PAC learning). All previous results either require that the underlying distribution is uniform Linial et al. (1993) (or simple variants of the uniform distribution) or restrict the depths of circuits being learned to 1 Valiant (1984) and 2 Klivans and Servedio (2004). As for the circuits of depth 3 or more, it is currently unknown how to PAC learn them. \newline In this paper we present an algorithm to PAC learn depth-3 $\textrm{AC}^0$ circuits of bounded top fanin over $(x_1,\cdots,x_n,\overline{x}_1,\cdots,\overline{x}_n)$. Our result is that every depth-3 $\textrm{AC}^0$ circuit of top fanin $K$ can be computed by a polynomial threshold function (PTF) of degree $\widetilde{O}(K\cdot n^{\frac{1}{2}})$, which means that it can be PAC learned in time $2^{\widetilde{O}(K\cdot n^{\frac{1}{2}})}$. In particular, when $K=O(n^{\epsilon_0})$ for any $\epsilon_0<\frac{1}{2}$, the time for learning is sub-exponential. We note that instead of employing some known tools we use some specific approximation in expressing such circuits in PTFs which can thus save a factor of $\textrm{polylog}(n)$ in degrees of the PTFs.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/ding17a.html
  PDF: http://proceedings.mlr.press/v76/ding17a/ding17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-ding17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Ning
    family: Ding
  - given: Yanli
    family: Ren
  - given: Dawu
    family: Gu
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 667-680
  id: ding17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 667
  lastpage: 680
  published: 2017-10-11 00:00:00 +0000
- title: 'A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds'
  abstract: 'Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms. In this paper we contribute to this effort in two ways: First, based on a new regret decomposition and a generalization of Bregman divergences, we provide a self-contained, modular analysis of the two workhorses of online learning: (general) adaptive versions of Mirror Descent (MD) and the Follow-the-Regularized-Leader (FTRL) algorithms. The analysis is done with extra care so as not to introduce assumptions not needed in the proofs and allows to combine, in a straightforward way, different algorithmic ideas (e.g., adaptivity, optimism, implicit updates) and learning settings (e.g., strongly convex or composite objectives). This way we are able to reprove, extend and refine a large body of the literature, while keeping the proofs concise. The second contribution is a byproduct of this careful analysis: We present algorithms with improved variational bounds for smooth, composite objectives, including a new family of optimistic MD algorithms with only one projection step per round. Furthermore, we provide a simple extension of adaptive regret bounds to practically relevant non-convex problem settings with essentially no extra effort.'
  volume: 76
  URL: https://proceedings.mlr.press/v76/joulani17a.html
  PDF: http://proceedings.mlr.press/v76/joulani17a/joulani17a.pdf
  edit: https://github.com/mlresearch//v76/edit/gh-pages/_posts/2017-10-11-joulani17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 28th International Conference on Algorithmic Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Pooria
    family: Joulani
  - given: András
    family: György
  - given: Csaba
    family: Szepesvári
  editor: 
  - given: Steve
    family: Hanneke
  - given: Lev
    family: Reyzin
  page: 681-720
  id: joulani17a
  issued:
    date-parts: 
      - 2017
      - 10
      - 11
  firstpage: 681
  lastpage: 720
  published: 2017-10-11 00:00:00 +0000