Hierarchical Mixtures-of-Experts for Generalized Linear Models: Some Results on Denseness and Consistency

Wenxin Jiang, Martin A. Tanner
Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, PMLR R2, 1999.

Abstract

We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form $\psi(a+x^T b)$ are mixed. Here $\psi(\cdot)$ is the inverse link function. Suppose the true response $y$ follows an exponential family regression model with mean function belonging to a class of smooth functions of the form $\psi(h(x))$ where $h \in W_{2;K_0}^\infty$ (a Sobolev class over $[0,1]^{s}$). It is shown that the HME mean functions can approximate the true mean function, at a rate of $O(m^{-2/s})$ in $L_p$ norm. Moreover, the HME probability density functions can approximate the true density, at a rate of $O(m^{-2/s})$ in Hellinger distance, and at a rate of $O(m^{-4/s})$ in Kullback-Leibler divergence. These rates can be achieved within the family of HME structures with a tree of binary splits, or within the family of structures with a single layer of experts. Here $s$ is the dimension of the predictor $x$. It is also shown that likelihood-based inference based on HME is consistent in recovering the truth, in the sense that as the sample size $n$ and the number of experts $m$ both increase, the mean square error of the estimated mean response goes to zero. Conditions for such results to hold are stated and discussed.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR2-jiang99a, title = {Hierarchical Mixtures-of-Experts for Generalized Linear Models: Some Results on Denseness and Consistency}, author = {Jiang, Wenxin and Tanner, Martin A.}, booktitle = {Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics}, year = {1999}, editor = {Heckerman, David and Whittaker, Joe}, volume = {R2}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r2/jiang99a/jiang99a.pdf}, url = {https://proceedings.mlr.press/r2/jiang99a.html}, abstract = {We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form $\psi(a+x^T b)$ are mixed. Here $\psi(\cdot)$ is the inverse link function. Suppose the true response $y$ follows an exponential family regression model with mean function belonging to a class of smooth functions of the form $\psi(h(x))$ where $h \in W_{2;K_0}^\infty$ (a Sobolev class over $[0,1]^{s}$). It is shown that the HME mean functions can approximate the true mean function, at a rate of $O(m^{-2/s})$ in $L_p$ norm. Moreover, the HME probability density functions can approximate the true density, at a rate of $O(m^{-2/s})$ in Hellinger distance, and at a rate of $O(m^{-4/s})$ in Kullback-Leibler divergence. These rates can be achieved within the family of HME structures with a tree of binary splits, or within the family of structures with a single layer of experts. Here $s$ is the dimension of the predictor $x$. It is also shown that likelihood-based inference based on HME is consistent in recovering the truth, in the sense that as the sample size $n$ and the number of experts $m$ both increase, the mean square error of the estimated mean response goes to zero. Conditions for such results to hold are stated and discussed.}, note = {Reissued by PMLR on 20 August 2020.} }
Endnote
%0 Conference Paper %T Hierarchical Mixtures-of-Experts for Generalized Linear Models: Some Results on Denseness and Consistency %A Wenxin Jiang %A Martin A. Tanner %B Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1999 %E David Heckerman %E Joe Whittaker %F pmlr-vR2-jiang99a %I PMLR %U https://proceedings.mlr.press/r2/jiang99a.html %V R2 %X We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form $\psi(a+x^T b)$ are mixed. Here $\psi(\cdot)$ is the inverse link function. Suppose the true response $y$ follows an exponential family regression model with mean function belonging to a class of smooth functions of the form $\psi(h(x))$ where $h \in W_{2;K_0}^\infty$ (a Sobolev class over $[0,1]^{s}$). It is shown that the HME mean functions can approximate the true mean function, at a rate of $O(m^{-2/s})$ in $L_p$ norm. Moreover, the HME probability density functions can approximate the true density, at a rate of $O(m^{-2/s})$ in Hellinger distance, and at a rate of $O(m^{-4/s})$ in Kullback-Leibler divergence. These rates can be achieved within the family of HME structures with a tree of binary splits, or within the family of structures with a single layer of experts. Here $s$ is the dimension of the predictor $x$. It is also shown that likelihood-based inference based on HME is consistent in recovering the truth, in the sense that as the sample size $n$ and the number of experts $m$ both increase, the mean square error of the estimated mean response goes to zero. Conditions for such results to hold are stated and discussed. %Z Reissued by PMLR on 20 August 2020.
APA
Jiang, W. & Tanner, M.A.. (1999). Hierarchical Mixtures-of-Experts for Generalized Linear Models: Some Results on Denseness and Consistency. Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R2 Available from https://proceedings.mlr.press/r2/jiang99a.html. Reissued by PMLR on 20 August 2020.

Related Material