Efficient Computing of Stochastic Complexity

Petri Kontkanen, Wray L. Buntine, Petri Myllymäki, Jorma Rissanen, Henry Tirri
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, PMLR R4:171-178, 2003.

Abstract

Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form. The results suggest that due to its accuracy and efficiency, the new sharper approximation will be useful for a wide class of problems with discrete data.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR4-kontkanen03a, title = {Efficient Computing of Stochastic Complexity}, author = {Kontkanen, Petri and Buntine, Wray L. and Myllym{\"{a}}ki, Petri and Rissanen, Jorma and Tirri, Henry}, booktitle = {Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics}, pages = {171--178}, year = {2003}, editor = {Bishop, Christopher M. and Frey, Brendan J.}, volume = {R4}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r4/kontkanen03a/kontkanen03a.pdf}, url = {https://proceedings.mlr.press/r4/kontkanen03a.html}, abstract = {Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form. The results suggest that due to its accuracy and efficiency, the new sharper approximation will be useful for a wide class of problems with discrete data.}, note = {Reissued by PMLR on 01 April 2021.} }
Endnote
%0 Conference Paper %T Efficient Computing of Stochastic Complexity %A Petri Kontkanen %A Wray L. Buntine %A Petri Myllymäki %A Jorma Rissanen %A Henry Tirri %B Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2003 %E Christopher M. Bishop %E Brendan J. Frey %F pmlr-vR4-kontkanen03a %I PMLR %P 171--178 %U https://proceedings.mlr.press/r4/kontkanen03a.html %V R4 %X Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form. The results suggest that due to its accuracy and efficiency, the new sharper approximation will be useful for a wide class of problems with discrete data. %Z Reissued by PMLR on 01 April 2021.
APA
Kontkanen, P., Buntine, W.L., Myllymäki, P., Rissanen, J. & Tirri, H.. (2003). Efficient Computing of Stochastic Complexity. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R4:171-178 Available from https://proceedings.mlr.press/r4/kontkanen03a.html. Reissued by PMLR on 01 April 2021.

Related Material