On the inductive bias of neural networks for learning read-once DNFs

Ido Bronstein, Alon Brutzkus, Amir Globerson
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:255-265, 2022.

Abstract

Learning functions over Boolean variables is a fundamental problem in machine learning. But not much is known about learning such functions using neural networks. Here we focus on learning read-once disjunctive normal forms (DNFs) under the uniform distribution with a convex neural network and gradient methods. We first observe empirically that gradient methods converge to compact solutions with neurons that are aligned with the terms of the DNF. This is despite the fact that there are many zero training error networks that do not have this property. Thus, the learning process has a clear inductive bias towards such logical formulas. Following recent results which connect the inductive bias of gradient flow (GF) to Karush-Kuhn-Tucker (KKT) points of minimum norm problems, we study these KKT points in our setting. We prove that zero training error solutions that memorize training points are not KKT points and therefore GF cannot converge to them. On the other hand, we prove that globally optimal KKT points correspond exactly to networks that are aligned with the DNF terms. These results suggest a strong connection between the inductive bias of GF and solutions that align with the DNF. We conclude with extensive experiments which verify our findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-bronstein22a, title = {On the inductive bias of neural networks for learning read-once DNFs}, author = {Bronstein, Ido and Brutzkus, Alon and Globerson, Amir}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {255--265}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/bronstein22a/bronstein22a.pdf}, url = {https://proceedings.mlr.press/v180/bronstein22a.html}, abstract = {Learning functions over Boolean variables is a fundamental problem in machine learning. But not much is known about learning such functions using neural networks. Here we focus on learning read-once disjunctive normal forms (DNFs) under the uniform distribution with a convex neural network and gradient methods. We first observe empirically that gradient methods converge to compact solutions with neurons that are aligned with the terms of the DNF. This is despite the fact that there are many zero training error networks that do not have this property. Thus, the learning process has a clear inductive bias towards such logical formulas. Following recent results which connect the inductive bias of gradient flow (GF) to Karush-Kuhn-Tucker (KKT) points of minimum norm problems, we study these KKT points in our setting. We prove that zero training error solutions that memorize training points are not KKT points and therefore GF cannot converge to them. On the other hand, we prove that globally optimal KKT points correspond exactly to networks that are aligned with the DNF terms. These results suggest a strong connection between the inductive bias of GF and solutions that align with the DNF. We conclude with extensive experiments which verify our findings.} }
Endnote
%0 Conference Paper %T On the inductive bias of neural networks for learning read-once DNFs %A Ido Bronstein %A Alon Brutzkus %A Amir Globerson %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-bronstein22a %I PMLR %P 255--265 %U https://proceedings.mlr.press/v180/bronstein22a.html %V 180 %X Learning functions over Boolean variables is a fundamental problem in machine learning. But not much is known about learning such functions using neural networks. Here we focus on learning read-once disjunctive normal forms (DNFs) under the uniform distribution with a convex neural network and gradient methods. We first observe empirically that gradient methods converge to compact solutions with neurons that are aligned with the terms of the DNF. This is despite the fact that there are many zero training error networks that do not have this property. Thus, the learning process has a clear inductive bias towards such logical formulas. Following recent results which connect the inductive bias of gradient flow (GF) to Karush-Kuhn-Tucker (KKT) points of minimum norm problems, we study these KKT points in our setting. We prove that zero training error solutions that memorize training points are not KKT points and therefore GF cannot converge to them. On the other hand, we prove that globally optimal KKT points correspond exactly to networks that are aligned with the DNF terms. These results suggest a strong connection between the inductive bias of GF and solutions that align with the DNF. We conclude with extensive experiments which verify our findings.
APA
Bronstein, I., Brutzkus, A. & Globerson, A.. (2022). On the inductive bias of neural networks for learning read-once DNFs. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:255-265 Available from https://proceedings.mlr.press/v180/bronstein22a.html.

Related Material