Finding Overlapping Distributions with MML

Rohan A. Baxter, Jonathan J. Oliver
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:23-30, 1997.

Abstract

This paper considers an aspect of mixture modelling. Previous studies have shown minimum message length (MML) estimation to perform well in a wide variety of mixture modelling problems, including determining the number of com- ponents which best describes some data. In this paper, we focus on the difficult problem of overlapping components. An advantage of the probabilistic mixture modelling approach is its ability to identify models where the components overlap and data items can belong prob- abilistically to more than one component. Significantly overlapping distributions require more data for their parameters to be accurately estimated than well sep- arated distributions. For example, two Gaussian distributions are considered to significantly overlap when their means are within three standard deviations of each other. If insufficient data is available, only a single component distribution will be estimated, although the data originates from two component distributions. In this paper, we quantify this difficulty in terms of the number of data items needed for the MML criterion to ’discover’ two overlapping components. First, we perform experiments which compare the MML criterion’s performance relative to other Bayesian criteria based on MCMC sampling. Second, we make two alterations to the existing MML estimates in order to improve its performance on overlapping distributions. Experiments are performed with the new estimates to confirm that they are effective.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR1-baxter97a, title = {Finding Overlapping Distributions with MML}, author = {Baxter, Rohan A. and Oliver, Jonathan J.}, booktitle = {Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics}, pages = {23--30}, year = {1997}, editor = {Madigan, David and Smyth, Padhraic}, volume = {R1}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r1/baxter97a/baxter97a.pdf}, url = {https://proceedings.mlr.press/r1/baxter97a.html}, abstract = {This paper considers an aspect of mixture modelling. Previous studies have shown minimum message length (MML) estimation to perform well in a wide variety of mixture modelling problems, including determining the number of com- ponents which best describes some data. In this paper, we focus on the difficult problem of overlapping components. An advantage of the probabilistic mixture modelling approach is its ability to identify models where the components overlap and data items can belong prob- abilistically to more than one component. Significantly overlapping distributions require more data for their parameters to be accurately estimated than well sep- arated distributions. For example, two Gaussian distributions are considered to significantly overlap when their means are within three standard deviations of each other. If insufficient data is available, only a single component distribution will be estimated, although the data originates from two component distributions. In this paper, we quantify this difficulty in terms of the number of data items needed for the MML criterion to ’discover’ two overlapping components. First, we perform experiments which compare the MML criterion’s performance relative to other Bayesian criteria based on MCMC sampling. Second, we make two alterations to the existing MML estimates in order to improve its performance on overlapping distributions. Experiments are performed with the new estimates to confirm that they are effective.}, note = {Reissued by PMLR on 30 March 2021.} }
Endnote
%0 Conference Paper %T Finding Overlapping Distributions with MML %A Rohan A. Baxter %A Jonathan J. Oliver %B Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1997 %E David Madigan %E Padhraic Smyth %F pmlr-vR1-baxter97a %I PMLR %P 23--30 %U https://proceedings.mlr.press/r1/baxter97a.html %V R1 %X This paper considers an aspect of mixture modelling. Previous studies have shown minimum message length (MML) estimation to perform well in a wide variety of mixture modelling problems, including determining the number of com- ponents which best describes some data. In this paper, we focus on the difficult problem of overlapping components. An advantage of the probabilistic mixture modelling approach is its ability to identify models where the components overlap and data items can belong prob- abilistically to more than one component. Significantly overlapping distributions require more data for their parameters to be accurately estimated than well sep- arated distributions. For example, two Gaussian distributions are considered to significantly overlap when their means are within three standard deviations of each other. If insufficient data is available, only a single component distribution will be estimated, although the data originates from two component distributions. In this paper, we quantify this difficulty in terms of the number of data items needed for the MML criterion to ’discover’ two overlapping components. First, we perform experiments which compare the MML criterion’s performance relative to other Bayesian criteria based on MCMC sampling. Second, we make two alterations to the existing MML estimates in order to improve its performance on overlapping distributions. Experiments are performed with the new estimates to confirm that they are effective. %Z Reissued by PMLR on 30 March 2021.
APA
Baxter, R.A. & Oliver, J.J.. (1997). Finding Overlapping Distributions with MML. Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R1:23-30 Available from https://proceedings.mlr.press/r1/baxter97a.html. Reissued by PMLR on 30 March 2021.

Related Material