Minimum description length (MDL) regularization for online learning

Gil I. Shamir
Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, PMLR 44:260-276, 2015.

Abstract

An approach inspired by the \emphMinimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented. They are based on using a Bayesian mixture in each of the dimensions of the feature space. By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss. Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v44-shamir15, title = {Minimum description length ({MDL}) regularization for online learning}, author = {Shamir, Gil I.}, booktitle = {Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015}, pages = {260--276}, year = {2015}, editor = {Storcheus, Dmitry and Rostamizadeh, Afshin and Kumar, Sanjiv}, volume = {44}, series = {Proceedings of Machine Learning Research}, address = {Montreal, Canada}, month = {11 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v44/shamir15.pdf}, url = {https://proceedings.mlr.press/v44/shamir15.html}, abstract = {An approach inspired by the \emphMinimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented. They are based on using a Bayesian mixture in each of the dimensions of the feature space. By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss. Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach.} }
Endnote
%0 Conference Paper %T Minimum description length (MDL) regularization for online learning %A Gil I. Shamir %B Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015 %C Proceedings of Machine Learning Research %D 2015 %E Dmitry Storcheus %E Afshin Rostamizadeh %E Sanjiv Kumar %F pmlr-v44-shamir15 %I PMLR %P 260--276 %U https://proceedings.mlr.press/v44/shamir15.html %V 44 %X An approach inspired by the \emphMinimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented. They are based on using a Bayesian mixture in each of the dimensions of the feature space. By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss. Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach.
RIS
TY - CPAPER TI - Minimum description length (MDL) regularization for online learning AU - Gil I. Shamir BT - Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015 DA - 2015/12/08 ED - Dmitry Storcheus ED - Afshin Rostamizadeh ED - Sanjiv Kumar ID - pmlr-v44-shamir15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 44 SP - 260 EP - 276 L1 - http://proceedings.mlr.press/v44/shamir15.pdf UR - https://proceedings.mlr.press/v44/shamir15.html AB - An approach inspired by the \emphMinimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented. They are based on using a Bayesian mixture in each of the dimensions of the feature space. By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss. Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach. ER -
APA
Shamir, G.I.. (2015). Minimum description length (MDL) regularization for online learning. Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, in Proceedings of Machine Learning Research 44:260-276 Available from https://proceedings.mlr.press/v44/shamir15.html.

Related Material