Statistical Learning from Attribution Sets

Lorne Applebaum; Robert Busa-Fekete; August Chen; Claudio Gentile; Tomer Koren; Aryan Mokhtari

Statistical Learning from Attribution Sets

Lorne Applebaum, Robert Busa-Fekete, August Chen, Claudio Gentile, Tomer Koren, Aryan Mokhtari

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:290-336, 2026.

Abstract

We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-applebaum26a,
  title = 	 {Statistical Learning from Attribution Sets},
  author =       {Applebaum, Lorne and Busa-Fekete, Robert and Chen, August and Gentile, Claudio and Koren, Tomer and Mokhtari, Aryan},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {290--336},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/applebaum26a/applebaum26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/applebaum26a.html},
  abstract = 	 {We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.}
}

Endnote

%0 Conference Paper
%T Statistical Learning from Attribution Sets
%A Lorne Applebaum
%A Robert Busa-Fekete
%A August Chen
%A Claudio Gentile
%A Tomer Koren
%A Aryan Mokhtari
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-applebaum26a
%I PMLR
%P 290--336
%U https://proceedings.mlr.press/v336/applebaum26a.html
%V 336
%X We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.

APA

Applebaum, L., Busa-Fekete, R., Chen, A., Gentile, C., Koren, T. & Mokhtari, A.. (2026). Statistical Learning from Attribution Sets. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:290-336 Available from https://proceedings.mlr.press/v336/applebaum26a.html.

Related Material

Download PDF