[edit]
Hierarchical Mixtures-of-Experts for Generalized Linear Models: Some Results on Denseness and Consistency
Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, PMLR R2, 1999.
Abstract
We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form $\psi(a+x^T b)$ are mixed. Here $\psi(\cdot)$ is the inverse link function. Suppose the true response $y$ follows an exponential family regression model with mean function belonging to a class of smooth functions of the form $\psi(h(x))$ where $h \in W_{2;K_0}^\infty$ (a Sobolev class over $[0,1]^{s}$). It is shown that the HME mean functions can approximate the true mean function, at a rate of $O(m^{-2/s})$ in $L_p$ norm. Moreover, the HME probability density functions can approximate the true density, at a rate of $O(m^{-2/s})$ in Hellinger distance, and at a rate of $O(m^{-4/s})$ in Kullback-Leibler divergence. These rates can be achieved within the family of HME structures with a tree of binary splits, or within the family of structures with a single layer of experts. Here $s$ is the dimension of the predictor $x$. It is also shown that likelihood-based inference based on HME is consistent in recovering the truth, in the sense that as the sample size $n$ and the number of experts $m$ both increase, the mean square error of the estimated mean response goes to zero. Conditions for such results to hold are stated and discussed.