Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers

Dani Yogatama; Noah Smith

Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers

Dani Yogatama, Noah Smith

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):656-664, 2014.

Abstract

In many high-dimensional learning problems, only some parts of an observation are important to the prediction task; for example, the cues to correctly categorizing a document may lie in a handful of its sentences. We introduce a learning algorithm that exploits this intuition by encoding it in a regularizer. Specifically, we apply the sparse overlapping group lasso with one group for every bundle of features occurring together in a training-data sentence, leading to thousands to millions of overlapping groups. We show how to efficiently solve the resulting optimization challenge using the alternating directions method of multipliers. We find that the resulting method significantly outperforms competitive baselines (standard ridge, lasso, and elastic net regularizers) on a suite of real-world text categorization problems.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-yogatama14,
  title = 	 {Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers},
  author = 	 {Yogatama, Dani and Smith, Noah},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {656--664},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/yogatama14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/yogatama14.html},
  abstract = 	 {In many high-dimensional learning problems, only some parts of an observation are important to the prediction task; for example, the cues to correctly categorizing a document may lie in a handful of its sentences. We introduce a learning algorithm that exploits this intuition by encoding it in a regularizer.  Specifically, we apply the sparse overlapping group lasso with one group for every bundle of features occurring together in a training-data sentence, leading to thousands to millions of overlapping groups. We show how to efficiently solve the resulting optimization challenge using the alternating directions method of multipliers.  We find that the resulting method significantly outperforms competitive baselines (standard ridge, lasso, and elastic net regularizers) on a suite of real-world text categorization problems.}
}

Endnote

%0 Conference Paper
%T Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers
%A Dani Yogatama
%A Noah Smith
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-yogatama14
%I PMLR
%P 656--664
%U https://proceedings.mlr.press/v32/yogatama14.html
%V 32
%N 1
%X In many high-dimensional learning problems, only some parts of an observation are important to the prediction task; for example, the cues to correctly categorizing a document may lie in a handful of its sentences. We introduce a learning algorithm that exploits this intuition by encoding it in a regularizer.  Specifically, we apply the sparse overlapping group lasso with one group for every bundle of features occurring together in a training-data sentence, leading to thousands to millions of overlapping groups. We show how to efficiently solve the resulting optimization challenge using the alternating directions method of multipliers.  We find that the resulting method significantly outperforms competitive baselines (standard ridge, lasso, and elastic net regularizers) on a suite of real-world text categorization problems.

RIS


TY  - CPAPER
TI  - Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers
AU  - Dani Yogatama
AU  - Noah Smith
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-yogatama14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 656
EP  - 664
L1  - http://proceedings.mlr.press/v32/yogatama14.pdf
UR  - https://proceedings.mlr.press/v32/yogatama14.html
AB  - In many high-dimensional learning problems, only some parts of an observation are important to the prediction task; for example, the cues to correctly categorizing a document may lie in a handful of its sentences. We introduce a learning algorithm that exploits this intuition by encoding it in a regularizer.  Specifically, we apply the sparse overlapping group lasso with one group for every bundle of features occurring together in a training-data sentence, leading to thousands to millions of overlapping groups. We show how to efficiently solve the resulting optimization challenge using the alternating directions method of multipliers.  We find that the resulting method significantly outperforms competitive baselines (standard ridge, lasso, and elastic net regularizers) on a suite of real-world text categorization problems.
ER  -

APA


Yogatama, D. & Smith, N.. (2014). Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):656-664 Available from https://proceedings.mlr.press/v32/yogatama14.html.

Related Material

Download PDF