[edit]
Hierarchical Mixtures-of-Experts for Generalized Linear Models: Some Results on Denseness and Consistency
Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, PMLR R2, 1999.
Abstract
We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form ψ(a+xTb) are mixed. Here ψ(⋅) is the inverse link function. Suppose the true response y follows an exponential family regression model with mean function belonging to a class of smooth functions of the form ψ(h(x)) where h∈W∞2;K0 (a Sobolev class over [0,1]s). It is shown that the HME mean functions can approximate the true mean function, at a rate of O(m−2/s) in Lp norm. Moreover, the HME probability density functions can approximate the true density, at a rate of O(m−2/s) in Hellinger distance, and at a rate of O(m−4/s) in Kullback-Leibler divergence. These rates can be achieved within the family of HME structures with a tree of binary splits, or within the family of structures with a single layer of experts. Here s is the dimension of the predictor x. It is also shown that likelihood-based inference based on HME is consistent in recovering the truth, in the sense that as the sample size n and the number of experts m both increase, the mean square error of the estimated mean response goes to zero. Conditions for such results to hold are stated and discussed.