Bridging Domains with Approximately Shared Features

Ziliang Samuel Zhong, Xiang Pan, Qi Lei
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:559-567, 2025.

Abstract

Machine learning models can suffer from performance degradation when applied to new tasks due to distribution shifts. Feature representation learning offers a robust solution to this issue. However, a fundamental challenge remains in devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. For better understanding, we propose a statistical framework that evaluates the utilities of the features (i.e., how differently the features are used in each source task) based on the variance of their correlation to $y$ across different domains. Under our framework, we design and analyze a learning procedure consisting of learning content features (comprising both invariant and approximately shared features) from source tasks and fine-tuning them on the target task. Our theoretical analysis highlights the significance of learning approximately shared features—beyond strictly invariant ones—when distribution shifts occur. Our analysis also yields an improved population risk on target tasks compared to previous results. Inspired by our theory, we introduce ProjectionNet, a practical method to distinguish content features from environmental features via \textit{explicit feature space control}, further consolidating our theoretical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-zhong25a, title = {Bridging Domains with Approximately Shared Features}, author = {Zhong, Ziliang Samuel and Pan, Xiang and Lei, Qi}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {559--567}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/zhong25a/zhong25a.pdf}, url = {https://proceedings.mlr.press/v258/zhong25a.html}, abstract = {Machine learning models can suffer from performance degradation when applied to new tasks due to distribution shifts. Feature representation learning offers a robust solution to this issue. However, a fundamental challenge remains in devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. For better understanding, we propose a statistical framework that evaluates the utilities of the features (i.e., how differently the features are used in each source task) based on the variance of their correlation to $y$ across different domains. Under our framework, we design and analyze a learning procedure consisting of learning content features (comprising both invariant and approximately shared features) from source tasks and fine-tuning them on the target task. Our theoretical analysis highlights the significance of learning approximately shared features—beyond strictly invariant ones—when distribution shifts occur. Our analysis also yields an improved population risk on target tasks compared to previous results. Inspired by our theory, we introduce ProjectionNet, a practical method to distinguish content features from environmental features via \textit{explicit feature space control}, further consolidating our theoretical findings.} }
Endnote
%0 Conference Paper %T Bridging Domains with Approximately Shared Features %A Ziliang Samuel Zhong %A Xiang Pan %A Qi Lei %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-zhong25a %I PMLR %P 559--567 %U https://proceedings.mlr.press/v258/zhong25a.html %V 258 %X Machine learning models can suffer from performance degradation when applied to new tasks due to distribution shifts. Feature representation learning offers a robust solution to this issue. However, a fundamental challenge remains in devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. For better understanding, we propose a statistical framework that evaluates the utilities of the features (i.e., how differently the features are used in each source task) based on the variance of their correlation to $y$ across different domains. Under our framework, we design and analyze a learning procedure consisting of learning content features (comprising both invariant and approximately shared features) from source tasks and fine-tuning them on the target task. Our theoretical analysis highlights the significance of learning approximately shared features—beyond strictly invariant ones—when distribution shifts occur. Our analysis also yields an improved population risk on target tasks compared to previous results. Inspired by our theory, we introduce ProjectionNet, a practical method to distinguish content features from environmental features via \textit{explicit feature space control}, further consolidating our theoretical findings.
APA
Zhong, Z.S., Pan, X. & Lei, Q.. (2025). Bridging Domains with Approximately Shared Features. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:559-567 Available from https://proceedings.mlr.press/v258/zhong25a.html.

Related Material