Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs

Ruichen Zhang, Mufan Qiu, Zhen Tan, Mohan Zhang, Xiaopeng Lu, Jie Peng, Kaidi Xu, Leandro Z. Agudelo, Peter Zhenghao Qian, Tianlong Chen
Conference on Parsimony and Learning, PMLR 328:375-427, 2026.

Abstract

Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks. Existing approaches typically rely on large LLMs (e.g., GPT-4o) to explore web environments and generate trajectory data, which is then used either for demonstration retrieval (for large LLMs) or to distill small LLMs (e.g., Llama3) in a process that remains decoupled from the exploration. In this paper, we propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance, yielding a “symbiotic improvement” for both large and small LLMs. Our study uncovers a complementary dynamic between LLM types: while large LLMs excel at generating high-quality trajectories for distillation, the distilled small LLMs—owing to their distinct reasoning capabilities—often choose actions that diverge from those of their larger counterparts. This divergence drives the exploration of novel trajectories, thereby enriching the synthesized data. However, we also observe that the performance of small LLMs becomes a bottleneck in this iterative enhancement process. To address this, we propose two innovations in LLM distillation: a speculative data synthesis strategy that mitigates off-policy bias, and a multi-task learning approach designed to boost the reasoning capabilities of the student LLM. Furthermore, we introduce a hybrid mode for privacy preservation to address user privacy concerns. Evaluated on the WebArena benchmark, AgentSymbiotic achieves state-of-the-art performance with both LLM types. Our best large-LLM agent reaches 52%, surpassing the previous best of 45%, while our 8B distilled model achieves 49%, effectively compressing the intelligence of large models into a compact, inference-efficient agent that reduces deployment costs while matching SoTA performance. Code is released at: https://anonymous.4open.science/r/agent-0E80/README.md

Cite this Paper


BibTeX
@InProceedings{pmlr-v328-zhang26a, title = {Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs}, author = {Zhang, Ruichen and Qiu, Mufan and Tan, Zhen and Zhang, Mohan and Lu, Xiaopeng and Peng, Jie and Xu, Kaidi and Agudelo, Leandro Z. and Qian, Peter Zhenghao and Chen, Tianlong}, booktitle = {Conference on Parsimony and Learning}, pages = {375--427}, year = {2026}, editor = {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui}, volume = {328}, series = {Proceedings of Machine Learning Research}, month = {23--26 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v328/main/assets/zhang26a/zhang26a.pdf}, url = {https://proceedings.mlr.press/v328/zhang26a.html}, abstract = {Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks. Existing approaches typically rely on large LLMs (e.g., GPT-4o) to explore web environments and generate trajectory data, which is then used either for demonstration retrieval (for large LLMs) or to distill small LLMs (e.g., Llama3) in a process that remains decoupled from the exploration. In this paper, we propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance, yielding a “symbiotic improvement” for both large and small LLMs. Our study uncovers a complementary dynamic between LLM types: while large LLMs excel at generating high-quality trajectories for distillation, the distilled small LLMs—owing to their distinct reasoning capabilities—often choose actions that diverge from those of their larger counterparts. This divergence drives the exploration of novel trajectories, thereby enriching the synthesized data. However, we also observe that the performance of small LLMs becomes a bottleneck in this iterative enhancement process. To address this, we propose two innovations in LLM distillation: a speculative data synthesis strategy that mitigates off-policy bias, and a multi-task learning approach designed to boost the reasoning capabilities of the student LLM. Furthermore, we introduce a hybrid mode for privacy preservation to address user privacy concerns. Evaluated on the WebArena benchmark, AgentSymbiotic achieves state-of-the-art performance with both LLM types. Our best large-LLM agent reaches 52%, surpassing the previous best of 45%, while our 8B distilled model achieves 49%, effectively compressing the intelligence of large models into a compact, inference-efficient agent that reduces deployment costs while matching SoTA performance. Code is released at: https://anonymous.4open.science/r/agent-0E80/README.md} }
Endnote
%0 Conference Paper %T Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs %A Ruichen Zhang %A Mufan Qiu %A Zhen Tan %A Mohan Zhang %A Xiaopeng Lu %A Jie Peng %A Kaidi Xu %A Leandro Z. Agudelo %A Peter Zhenghao Qian %A Tianlong Chen %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2026 %E Rebekka Burkholz %E Shiwei Liu %E Saiprasad Ravishankar %E William Redman %E Wei Huang %E Weijie Su %E Zhihui Zhu %F pmlr-v328-zhang26a %I PMLR %P 375--427 %U https://proceedings.mlr.press/v328/zhang26a.html %V 328 %X Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks. Existing approaches typically rely on large LLMs (e.g., GPT-4o) to explore web environments and generate trajectory data, which is then used either for demonstration retrieval (for large LLMs) or to distill small LLMs (e.g., Llama3) in a process that remains decoupled from the exploration. In this paper, we propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance, yielding a “symbiotic improvement” for both large and small LLMs. Our study uncovers a complementary dynamic between LLM types: while large LLMs excel at generating high-quality trajectories for distillation, the distilled small LLMs—owing to their distinct reasoning capabilities—often choose actions that diverge from those of their larger counterparts. This divergence drives the exploration of novel trajectories, thereby enriching the synthesized data. However, we also observe that the performance of small LLMs becomes a bottleneck in this iterative enhancement process. To address this, we propose two innovations in LLM distillation: a speculative data synthesis strategy that mitigates off-policy bias, and a multi-task learning approach designed to boost the reasoning capabilities of the student LLM. Furthermore, we introduce a hybrid mode for privacy preservation to address user privacy concerns. Evaluated on the WebArena benchmark, AgentSymbiotic achieves state-of-the-art performance with both LLM types. Our best large-LLM agent reaches 52%, surpassing the previous best of 45%, while our 8B distilled model achieves 49%, effectively compressing the intelligence of large models into a compact, inference-efficient agent that reduces deployment costs while matching SoTA performance. Code is released at: https://anonymous.4open.science/r/agent-0E80/README.md
APA
Zhang, R., Qiu, M., Tan, Z., Zhang, M., Lu, X., Peng, J., Xu, K., Agudelo, L.Z., Qian, P.Z. & Chen, T.. (2026). Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:375-427 Available from https://proceedings.mlr.press/v328/zhang26a.html.

Related Material