Generalized Optimal Reverse Prediction


Martha White, Dale Schuurmans ;
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:1305-1313, 2012.


Recently it has been shown that classical supervised and unsupervised training methods can be unified as special cases of so-called “optimal reverse prediction": predicting inputs from target labels while optimizing over both model parameters and missing labels. Although this perspective establishes links between classical training principles, the existing formulation only applies to linear predictors under squared loss, hence is extremely limited. We generalize the formulation of optimal reverse prediction to arbitrary Bregman divergences, and more importantly to nonlinear predictors. This extension is achieved by establishing a new, generalized form of forward-reverse minimization equivalence that holds for arbitrary matching losses. Several benefits follow. First, a new variant of Bregman divergence clustering can be recovered that incorporates a non-linear data reconstruction model. Second, normalized-cut and kernel-based extensions can be formulated coherently. Finally, a new semi-supervised training principle can be recovered for classification problems that demonstrates advantages over the state of the art.

Related Material