Heterogeneous Data Game: Characterizing the Model Competition Across Multiple Data Sources

Renzhe Xu, Kang Wang, Bo Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:69749-69771, 2025.

Abstract

Data heterogeneity across multiple sources is common in real-world machine learning (ML) settings. Although many methods focus on enabling a single model to handle diverse data, real-world markets often comprise multiple competing ML providers. In this paper, we propose a game-theoretic framework—the Heterogeneous Data Game—to analyze how such providers compete across heterogeneous data sources. We investigate the resulting pure Nash equilibria (PNE), showing that they can be non-existent, homogeneous (all providers converge on the same model), or heterogeneous (providers specialize in distinct data sources). Our analysis spans monopolistic, duopolistic, and more general markets, illustrating how factors such as the “temperature” of data-source choice models and the dominance of certain data sources shape equilibrium outcomes. We offer theoretical insights into both homogeneous and heterogeneous PNEs, guiding regulatory policies and practical strategies for competitive ML marketplaces.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-xu25ad, title = {Heterogeneous Data Game: Characterizing the Model Competition Across Multiple Data Sources}, author = {Xu, Renzhe and Wang, Kang and Li, Bo}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {69749--69771}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xu25ad/xu25ad.pdf}, url = {https://proceedings.mlr.press/v267/xu25ad.html}, abstract = {Data heterogeneity across multiple sources is common in real-world machine learning (ML) settings. Although many methods focus on enabling a single model to handle diverse data, real-world markets often comprise multiple competing ML providers. In this paper, we propose a game-theoretic framework—the Heterogeneous Data Game—to analyze how such providers compete across heterogeneous data sources. We investigate the resulting pure Nash equilibria (PNE), showing that they can be non-existent, homogeneous (all providers converge on the same model), or heterogeneous (providers specialize in distinct data sources). Our analysis spans monopolistic, duopolistic, and more general markets, illustrating how factors such as the “temperature” of data-source choice models and the dominance of certain data sources shape equilibrium outcomes. We offer theoretical insights into both homogeneous and heterogeneous PNEs, guiding regulatory policies and practical strategies for competitive ML marketplaces.} }
Endnote
%0 Conference Paper %T Heterogeneous Data Game: Characterizing the Model Competition Across Multiple Data Sources %A Renzhe Xu %A Kang Wang %A Bo Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-xu25ad %I PMLR %P 69749--69771 %U https://proceedings.mlr.press/v267/xu25ad.html %V 267 %X Data heterogeneity across multiple sources is common in real-world machine learning (ML) settings. Although many methods focus on enabling a single model to handle diverse data, real-world markets often comprise multiple competing ML providers. In this paper, we propose a game-theoretic framework—the Heterogeneous Data Game—to analyze how such providers compete across heterogeneous data sources. We investigate the resulting pure Nash equilibria (PNE), showing that they can be non-existent, homogeneous (all providers converge on the same model), or heterogeneous (providers specialize in distinct data sources). Our analysis spans monopolistic, duopolistic, and more general markets, illustrating how factors such as the “temperature” of data-source choice models and the dominance of certain data sources shape equilibrium outcomes. We offer theoretical insights into both homogeneous and heterogeneous PNEs, guiding regulatory policies and practical strategies for competitive ML marketplaces.
APA
Xu, R., Wang, K. & Li, B.. (2025). Heterogeneous Data Game: Characterizing the Model Competition Across Multiple Data Sources. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:69749-69771 Available from https://proceedings.mlr.press/v267/xu25ad.html.

Related Material