Generalization and Learnability in Multiple Instance Regression

Kushal Chauhan, Rishi Saket, Lorne Applebaum, Ashwinkumar Badanidiyuru, Chandan Giri, Aravindan Raghuveer
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:603-618, 2024.

Abstract

Multiple instance regression (MIR) was introduced by Ray and Page (2001) as an analogue of multiple instance learning (MIL) in which we are given bags of feature-vectors (instances) and for each bag there is a bag-label which matches the label of one (unknown) primary instance from that bag. The goal is to compute a hypothesis regressor consistent with the underlying instance-labels. A natural approach is to find the best primary instance assignment and regressor optimizing the mse loss on the bags though no formal generalization guarantees were known. Our work is the first to prove generalization error bounds for MIR when the bags are drawn i.i.d. at random. Essentially, with high probability any MIR regressor with low error on sampled bags also has low error on the underlying instance-label distribution. We next study the complexity of linear regression on MIR bags, shown to be NP-hard in general by Ray and Page (2001), who however left open the possibility of arbitrarily good approximations. Significantly strengthening previous work, we prove a strong inapproximability bound: even if there exists zero bag-loss MIR linear regressor on a collection of $2$-sized bags with labels in $[-1,1]$, it is NP-hard to find an MIR linear regressor with bag-loss $< C$ for some absolute constant $C > 0$. Our work also proposes a model training method for MIR based on a novel weighted assignment loss, geared towards handling overlapping bags which have not received much attention previously. We conduct empirical evaluations on synthetic and real-world datasets showing that our method outperforms the baseline MIR methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-chauhan24a, title = {Generalization and Learnability in Multiple Instance Regression}, author = {Chauhan, Kushal and Saket, Rishi and Applebaum, Lorne and Badanidiyuru, Ashwinkumar and Giri, Chandan and Raghuveer, Aravindan}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {603--618}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/chauhan24a/chauhan24a.pdf}, url = {https://proceedings.mlr.press/v244/chauhan24a.html}, abstract = {Multiple instance regression (MIR) was introduced by Ray and Page (2001) as an analogue of multiple instance learning (MIL) in which we are given bags of feature-vectors (instances) and for each bag there is a bag-label which matches the label of one (unknown) primary instance from that bag. The goal is to compute a hypothesis regressor consistent with the underlying instance-labels. A natural approach is to find the best primary instance assignment and regressor optimizing the mse loss on the bags though no formal generalization guarantees were known. Our work is the first to prove generalization error bounds for MIR when the bags are drawn i.i.d. at random. Essentially, with high probability any MIR regressor with low error on sampled bags also has low error on the underlying instance-label distribution. We next study the complexity of linear regression on MIR bags, shown to be NP-hard in general by Ray and Page (2001), who however left open the possibility of arbitrarily good approximations. Significantly strengthening previous work, we prove a strong inapproximability bound: even if there exists zero bag-loss MIR linear regressor on a collection of $2$-sized bags with labels in $[-1,1]$, it is NP-hard to find an MIR linear regressor with bag-loss $< C$ for some absolute constant $C > 0$. Our work also proposes a model training method for MIR based on a novel weighted assignment loss, geared towards handling overlapping bags which have not received much attention previously. We conduct empirical evaluations on synthetic and real-world datasets showing that our method outperforms the baseline MIR methods.} }
Endnote
%0 Conference Paper %T Generalization and Learnability in Multiple Instance Regression %A Kushal Chauhan %A Rishi Saket %A Lorne Applebaum %A Ashwinkumar Badanidiyuru %A Chandan Giri %A Aravindan Raghuveer %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-chauhan24a %I PMLR %P 603--618 %U https://proceedings.mlr.press/v244/chauhan24a.html %V 244 %X Multiple instance regression (MIR) was introduced by Ray and Page (2001) as an analogue of multiple instance learning (MIL) in which we are given bags of feature-vectors (instances) and for each bag there is a bag-label which matches the label of one (unknown) primary instance from that bag. The goal is to compute a hypothesis regressor consistent with the underlying instance-labels. A natural approach is to find the best primary instance assignment and regressor optimizing the mse loss on the bags though no formal generalization guarantees were known. Our work is the first to prove generalization error bounds for MIR when the bags are drawn i.i.d. at random. Essentially, with high probability any MIR regressor with low error on sampled bags also has low error on the underlying instance-label distribution. We next study the complexity of linear regression on MIR bags, shown to be NP-hard in general by Ray and Page (2001), who however left open the possibility of arbitrarily good approximations. Significantly strengthening previous work, we prove a strong inapproximability bound: even if there exists zero bag-loss MIR linear regressor on a collection of $2$-sized bags with labels in $[-1,1]$, it is NP-hard to find an MIR linear regressor with bag-loss $< C$ for some absolute constant $C > 0$. Our work also proposes a model training method for MIR based on a novel weighted assignment loss, geared towards handling overlapping bags which have not received much attention previously. We conduct empirical evaluations on synthetic and real-world datasets showing that our method outperforms the baseline MIR methods.
APA
Chauhan, K., Saket, R., Applebaum, L., Badanidiyuru, A., Giri, C. & Raghuveer, A.. (2024). Generalization and Learnability in Multiple Instance Regression. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:603-618 Available from https://proceedings.mlr.press/v244/chauhan24a.html.

Related Material