Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning

François Denis; Mattias Gybels; Amaury Habrard

Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning

François Denis, Mattias Gybels, Amaury Habrard

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):449-457, 2014.

Abstract

Learning probabilistic models over strings is an important issue for many applications. Spectral methods propose elegant solutions to the problem of inferring weighted automata from finite samples of variable-length strings drawn from an unknown target distribution. These methods rely on a singular value decomposition of a matrix H_S, called the Hankel matrix, that records the frequencies of (some of) the observed strings. The accuracy of the learned distribution depends both on the quantity of information embedded in H_S and on the distance between H_S and its mean H_r. Existing concentration bounds seem to indicate that the concentration over H_r gets looser with its size, suggesting to make a trade-off between the quantity of used information and the size of H_r. We propose new dimension-free concentration bounds for several variants of Hankel matrices. Experiments demonstrate that these bounds are tight and that they significantly improve existing bounds. These results suggest that the concentration rate of the Hankel matrix around its mean does not constitute an argument for limiting its size.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-denis14,
  title = 	 {Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning},
  author = 	 {Denis, François and Gybels, Mattias and Habrard, Amaury},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {449--457},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/denis14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/denis14.html},
  abstract = 	 {Learning probabilistic models over strings is an important issue for many applications. Spectral methods propose elegant solutions to the problem of inferring weighted automata from finite samples of variable-length strings drawn from an unknown target distribution. These methods rely on a singular value decomposition of a matrix H_S, called the Hankel matrix, that records the frequencies of (some of) the observed strings. The accuracy of the learned distribution depends both on the quantity of information embedded in H_S and on the distance between H_S and its mean H_r. Existing concentration bounds seem to indicate that the concentration over H_r gets looser with its size, suggesting to make a trade-off between the quantity of used information and the size of H_r. We propose new dimension-free concentration bounds for several variants of Hankel matrices. Experiments demonstrate that these bounds are tight and that they significantly improve existing bounds. These results suggest that the concentration rate of the Hankel matrix around its mean does not constitute an argument for limiting its size.}
}

Endnote

%0 Conference Paper
%T Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning
%A François Denis
%A Mattias Gybels
%A Amaury Habrard
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-denis14
%I PMLR
%P 449--457
%U https://proceedings.mlr.press/v32/denis14.html
%V 32
%N 1
%X Learning probabilistic models over strings is an important issue for many applications. Spectral methods propose elegant solutions to the problem of inferring weighted automata from finite samples of variable-length strings drawn from an unknown target distribution. These methods rely on a singular value decomposition of a matrix H_S, called the Hankel matrix, that records the frequencies of (some of) the observed strings. The accuracy of the learned distribution depends both on the quantity of information embedded in H_S and on the distance between H_S and its mean H_r. Existing concentration bounds seem to indicate that the concentration over H_r gets looser with its size, suggesting to make a trade-off between the quantity of used information and the size of H_r. We propose new dimension-free concentration bounds for several variants of Hankel matrices. Experiments demonstrate that these bounds are tight and that they significantly improve existing bounds. These results suggest that the concentration rate of the Hankel matrix around its mean does not constitute an argument for limiting its size.

RIS


TY  - CPAPER
TI  - Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning
AU  - François Denis
AU  - Mattias Gybels
AU  - Amaury Habrard
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-denis14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 449
EP  - 457
L1  - http://proceedings.mlr.press/v32/denis14.pdf
UR  - https://proceedings.mlr.press/v32/denis14.html
AB  - Learning probabilistic models over strings is an important issue for many applications. Spectral methods propose elegant solutions to the problem of inferring weighted automata from finite samples of variable-length strings drawn from an unknown target distribution. These methods rely on a singular value decomposition of a matrix H_S, called the Hankel matrix, that records the frequencies of (some of) the observed strings. The accuracy of the learned distribution depends both on the quantity of information embedded in H_S and on the distance between H_S and its mean H_r. Existing concentration bounds seem to indicate that the concentration over H_r gets looser with its size, suggesting to make a trade-off between the quantity of used information and the size of H_r. We propose new dimension-free concentration bounds for several variants of Hankel matrices. Experiments demonstrate that these bounds are tight and that they significantly improve existing bounds. These results suggest that the concentration rate of the Hankel matrix around its mean does not constitute an argument for limiting its size.
ER  -

APA


Denis, F., Gybels, M. & Habrard, A.. (2014). Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):449-457 Available from https://proceedings.mlr.press/v32/denis14.html.

Related Material

Download PDF