[edit]

# Near Optimal Distributed Learning of Halfspaces with Two Parties

*Proceedings of Thirty Fourth Conference on Learning Theory*, PMLR 134:724-758, 2021.

#### Abstract

*Distributed learning*protocols are designed to train on distributed data without gathering it all on a single centralized machine, thus contributing to the efficiency of the system and enhancing its privacy. We study a central problem in distributed learning, called {\it distributed learning of halfspaces}: let $U \subseteq \mathbb{R}^d$ be a known domain of size $n$ and let $h:\mathbb{R}^d\to \mathbb{R}$ be an unknown target affine function.\footnote{In practice, the domain $U$ is defined implicitly by the representation of $d$-dimensional vectors which is used in the protocol.} A set of

*examples*$\{(u,b)\}$ is distributed between several parties, where~$u \in U$ is a point and $b = \mathsf{sign}(h(u)) \in \{\pm 1\}$ is its label. The parties goal is to agree on a classifier~$f: U\to\{\pm 1\}$ such that~$f(u)=b$ for every input example~$(u,b)$. We design a protocol for the distributed halfspace learning problem in the two-party setting, communicating only $\tilde O(d\log n)$ bits. To this end, we introduce a new tool called

*halfspace containers*, that is closely related to

*bracketing numbers*in statistics and to

*hyperplane cuttings*in discrete geometry, and allows for a compressed approximate representation of every halfspace. We complement our upper bound result by an almost matching $\tilde \Omega(d\log n)$ lower bound on the communication complexity of any such protocol Since the distributed halfspace learning problem is closely related to the

*convex set disjointness*problem in communication complexity and the problem of

*distributed linear programming*in distributed optimization, we also derive upper and lower bounds of $\tilde O(d^2\log n)$ and~$\tilde{\Omega}(d\log n)$ on the communication complexity of both of these basic problems.