Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees

Vitaly Feldman; Pravesh Kothari; Jan Vondrák

Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees

Vitaly Feldman, Pravesh Kothari, Jan Vondrák

Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:711-740, 2013.

Abstract

We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube {0,1}^n. Our main result is the following structural theorem: any submodular function is ε-close in \ell_2 to a real-valued decision tree (DT) of depth O(1/ε^2). This immediately implies that any submodular function is ε-close to a function of at most 2^O(1/ε^2) variables and has a spectral \ell_1 norm of 2^O(1/ε^2). It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree O(1/ε^2) (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank 4/ε^2 and a proof that any rank-r DT can be ε-approximated by a DT of depth \frac52(r+\log(1/ε)). We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time \tildeO(n^2) ⋅2^O(1/ε^4). The best previous algorithm for the problem requires n^O(1/ε^2) time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of 2^Ω(1/ε^2/3) on the complexity of learning monotone submodular functions in any reasonable model (including learning with value queries); (2) computational lower bound of n^Ω(1/ε^2/3) based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.

Cite this Paper

BibTeX


@InProceedings{pmlr-v30-Feldman13,
  title = 	 {Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees},
  author = 	 {Feldman, Vitaly and Kothari, Pravesh and Vondrák, Jan},
  booktitle = 	 {Proceedings of the 26th Annual Conference on Learning Theory},
  pages = 	 {711--740},
  year = 	 {2013},
  editor = 	 {Shalev-Shwartz, Shai and Steinwart, Ingo},
  volume = 	 {30},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Princeton, NJ, USA},
  month = 	 {12--14 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v30/Feldman13.pdf},
  url = 	 {https://proceedings.mlr.press/v30/Feldman13.html},
  abstract = 	 {We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube {0,1}^n. Our main result is the following structural theorem: any submodular function is ε-close in \ell_2 to a real-valued decision tree (DT) of depth O(1/ε^2). This immediately implies that any submodular function is  ε-close to a function of at most 2^O(1/ε^2) variables and has a spectral \ell_1 norm of 2^O(1/ε^2). It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree O(1/ε^2) (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank 4/ε^2 and a proof that any rank-r DT can be ε-approximated by a DT of depth \frac52(r+\log(1/ε)). We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time \tildeO(n^2) ⋅2^O(1/ε^4). The best previous algorithm for the problem requires n^O(1/ε^2) time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of 2^Ω(1/ε^2/3) on the complexity of learning monotone submodular functions  in any reasonable model (including learning with value queries); (2) computational lower bound of n^Ω(1/ε^2/3) based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.}
}

Endnote

%0 Conference Paper
%T Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees
%A Vitaly Feldman
%A Pravesh Kothari
%A Jan Vondrák
%B Proceedings of the 26th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2013
%E Shai Shalev-Shwartz
%E Ingo Steinwart	
%F pmlr-v30-Feldman13
%I PMLR
%P 711--740
%U https://proceedings.mlr.press/v30/Feldman13.html
%V 30
%X We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube {0,1}^n. Our main result is the following structural theorem: any submodular function is ε-close in \ell_2 to a real-valued decision tree (DT) of depth O(1/ε^2). This immediately implies that any submodular function is  ε-close to a function of at most 2^O(1/ε^2) variables and has a spectral \ell_1 norm of 2^O(1/ε^2). It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree O(1/ε^2) (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank 4/ε^2 and a proof that any rank-r DT can be ε-approximated by a DT of depth \frac52(r+\log(1/ε)). We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time \tildeO(n^2) ⋅2^O(1/ε^4). The best previous algorithm for the problem requires n^O(1/ε^2) time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of 2^Ω(1/ε^2/3) on the complexity of learning monotone submodular functions  in any reasonable model (including learning with value queries); (2) computational lower bound of n^Ω(1/ε^2/3) based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.

RIS


TY  - CPAPER
TI  - Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees
AU  - Vitaly Feldman
AU  - Pravesh Kothari
AU  - Jan Vondrák
BT  - Proceedings of the 26th Annual Conference on Learning Theory
DA  - 2013/06/13
ED  - Shai Shalev-Shwartz
ED  - Ingo Steinwart	
ID  - pmlr-v30-Feldman13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 30
SP  - 711
EP  - 740
L1  - http://proceedings.mlr.press/v30/Feldman13.pdf
UR  - https://proceedings.mlr.press/v30/Feldman13.html
AB  - We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube {0,1}^n. Our main result is the following structural theorem: any submodular function is ε-close in \ell_2 to a real-valued decision tree (DT) of depth O(1/ε^2). This immediately implies that any submodular function is  ε-close to a function of at most 2^O(1/ε^2) variables and has a spectral \ell_1 norm of 2^O(1/ε^2). It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree O(1/ε^2) (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank 4/ε^2 and a proof that any rank-r DT can be ε-approximated by a DT of depth \frac52(r+\log(1/ε)). We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time \tildeO(n^2) ⋅2^O(1/ε^4). The best previous algorithm for the problem requires n^O(1/ε^2) time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of 2^Ω(1/ε^2/3) on the complexity of learning monotone submodular functions  in any reasonable model (including learning with value queries); (2) computational lower bound of n^Ω(1/ε^2/3) based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.
ER  -

APA


Feldman, V., Kothari, P. & Vondrák, J.. (2013). Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:711-740 Available from https://proceedings.mlr.press/v30/Feldman13.html.

Related Material

Download PDF