Minimum description length (MDL) regularization for online learning

Gil I. Shamir

Minimum description length (MDL) regularization for online learning

Gil I. Shamir

Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, PMLR 44:260-276, 2015.

Abstract

An approach inspired by the \emphMinimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented. They are based on using a Bayesian mixture in each of the dimensions of the feature space. By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss. Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach.

Cite this Paper

BibTeX


@InProceedings{pmlr-v44-shamir15,
  title = 	 {Minimum description length ({MDL}) regularization for online learning},
  author = 	 {Shamir, Gil I.},
  booktitle = 	 {Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015},
  pages = 	 {260--276},
  year = 	 {2015},
  editor = 	 {Storcheus, Dmitry and Rostamizadeh, Afshin and Kumar, Sanjiv},
  volume = 	 {44},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Montreal, Canada},
  month = 	 {11 Dec},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v44/shamir15.pdf},
  url = 	 {https://proceedings.mlr.press/v44/shamir15.html},
  abstract = 	 {An approach inspired by the \emphMinimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented.  They are based on using a Bayesian mixture in each of the dimensions of the feature space.  By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss.  Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach.}
}

Endnote

%0 Conference Paper
%T Minimum description length (MDL) regularization for online learning
%A Gil I. Shamir
%B Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015
%C Proceedings of Machine Learning Research
%D 2015
%E Dmitry Storcheus
%E Afshin Rostamizadeh
%E Sanjiv Kumar	
%F pmlr-v44-shamir15
%I PMLR
%P 260--276
%U https://proceedings.mlr.press/v44/shamir15.html
%V 44
%X An approach inspired by the \emphMinimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented.  They are based on using a Bayesian mixture in each of the dimensions of the feature space.  By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss.  Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach.

RIS


TY  - CPAPER
TI  - Minimum description length (MDL) regularization for online learning
AU  - Gil I. Shamir
BT  - Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015
DA  - 2015/12/08
ED  - Dmitry Storcheus
ED  - Afshin Rostamizadeh
ED  - Sanjiv Kumar	
ID  - pmlr-v44-shamir15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 44
SP  - 260
EP  - 276
L1  - http://proceedings.mlr.press/v44/shamir15.pdf
UR  - https://proceedings.mlr.press/v44/shamir15.html
AB  - An approach inspired by the \emphMinimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented.  They are based on using a Bayesian mixture in each of the dimensions of the feature space.  By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss.  Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach.
ER  -

APA


Shamir, G.I.. (2015). Minimum description length (MDL) regularization for online learning. Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, in Proceedings of Machine Learning Research 44:260-276 Available from https://proceedings.mlr.press/v44/shamir15.html.

Related Material

Download PDF