Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

Hussein Hazimeh, Rahul Mazumder
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1833-1843, 2020.

Abstract

In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization-based algorithms face difficulties in handling problems where the number of main features p   10^3 (with total number of features   p^2). In this paper, we study a convex relaxation which enforces strong hierarchy and develop a highly scalable algorithm based on proximal gradient descent. We introduce novel screening rules that allow for solving the complicated proximal problem in parallel. In addition, we introduce a specialized active-set strategy with gradient screening for avoiding costly gradient computations. The framework can handle problems having dense design matrices, with p = 50,000 (  10^9 interactions)—instances that are much larger than the state of the art. Experiments on real and synthetic data suggest that our toolkit hierScale outperforms the state of the art in terms of prediction and variable selection and can achieve over a 4900x speed-up.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-hazimeh20a, title = {Learning Hierarchical Interactions at Scale: A Convex Optimization Approach}, author = {Hazimeh, Hussein and Mazumder, Rahul}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {1833--1843}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/hazimeh20a/hazimeh20a.pdf}, url = {https://proceedings.mlr.press/v108/hazimeh20a.html}, abstract = {In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization-based algorithms face difficulties in handling problems where the number of main features p   10^3 (with total number of features   p^2). In this paper, we study a convex relaxation which enforces strong hierarchy and develop a highly scalable algorithm based on proximal gradient descent. We introduce novel screening rules that allow for solving the complicated proximal problem in parallel. In addition, we introduce a specialized active-set strategy with gradient screening for avoiding costly gradient computations. The framework can handle problems having dense design matrices, with p = 50,000 (  10^9 interactions)—instances that are much larger than the state of the art. Experiments on real and synthetic data suggest that our toolkit hierScale outperforms the state of the art in terms of prediction and variable selection and can achieve over a 4900x speed-up.} }
Endnote
%0 Conference Paper %T Learning Hierarchical Interactions at Scale: A Convex Optimization Approach %A Hussein Hazimeh %A Rahul Mazumder %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-hazimeh20a %I PMLR %P 1833--1843 %U https://proceedings.mlr.press/v108/hazimeh20a.html %V 108 %X In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization-based algorithms face difficulties in handling problems where the number of main features p   10^3 (with total number of features   p^2). In this paper, we study a convex relaxation which enforces strong hierarchy and develop a highly scalable algorithm based on proximal gradient descent. We introduce novel screening rules that allow for solving the complicated proximal problem in parallel. In addition, we introduce a specialized active-set strategy with gradient screening for avoiding costly gradient computations. The framework can handle problems having dense design matrices, with p = 50,000 (  10^9 interactions)—instances that are much larger than the state of the art. Experiments on real and synthetic data suggest that our toolkit hierScale outperforms the state of the art in terms of prediction and variable selection and can achieve over a 4900x speed-up.
APA
Hazimeh, H. & Mazumder, R.. (2020). Learning Hierarchical Interactions at Scale: A Convex Optimization Approach. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1833-1843 Available from https://proceedings.mlr.press/v108/hazimeh20a.html.

Related Material