Enhancing Long-Context Inference with Context-Position Duo-Mixture

Zhenyu Zhang, Sharath Nittur Sridhar, Zhangyang Wang, Souvik Kundu
Conference on Parsimony and Learning, PMLR 328:1112-1124, 2026.

Abstract

Long-context understanding ability of the existing large language models (LLMs) is generally limited by their pre-training context window, providing limited effectiveness as context length increases. Moreover, even within the range of pre-training context length, LLMs often fail to capture vital information present in the middle of the context-window. Towards mitigating these limitations, we introduce context-position duo-mixture (CoPMix) of LLMs, a simple yet effective training-free method designed to enhance their long-context understanding performance in terms of both effectiveness as well as context awareness. Specifically, we present an input context chunking and mixing strategy that divides long sequences into multiple chunks, each accompanied by a shared context sink. The input query attends to all chunks in parallel, enabling the efficient integration of information across chunks. We then introduce an adaptive assignment of positional information to enhance the context awareness. This duo-mixture strategy reduces the quadratic complexity of attention to sub-quadratic while improving long-context processing performance. Extensive experiments across multiple LLMs on diverse long-context datasets demonstrate that CoPMix achieves up to a 9.79% accuracy improvement over the existing alternatives, while reducing the pre-filling latency by up to 69.14% compared to full attention LLM alternative.

Cite this Paper


BibTeX
@InProceedings{pmlr-v328-zhang26c, title = {Enhancing Long-Context Inference with Context-Position Duo-Mixture}, author = {Zhang, Zhenyu and Sridhar, Sharath Nittur and Wang, Zhangyang and Kundu, Souvik}, booktitle = {Conference on Parsimony and Learning}, pages = {1112--1124}, year = {2026}, editor = {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui}, volume = {328}, series = {Proceedings of Machine Learning Research}, month = {23--26 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v328/main/assets/zhang26c/zhang26c.pdf}, url = {https://proceedings.mlr.press/v328/zhang26c.html}, abstract = {Long-context understanding ability of the existing large language models (LLMs) is generally limited by their pre-training context window, providing limited effectiveness as context length increases. Moreover, even within the range of pre-training context length, LLMs often fail to capture vital information present in the middle of the context-window. Towards mitigating these limitations, we introduce context-position duo-mixture (CoPMix) of LLMs, a simple yet effective training-free method designed to enhance their long-context understanding performance in terms of both effectiveness as well as context awareness. Specifically, we present an input context chunking and mixing strategy that divides long sequences into multiple chunks, each accompanied by a shared context sink. The input query attends to all chunks in parallel, enabling the efficient integration of information across chunks. We then introduce an adaptive assignment of positional information to enhance the context awareness. This duo-mixture strategy reduces the quadratic complexity of attention to sub-quadratic while improving long-context processing performance. Extensive experiments across multiple LLMs on diverse long-context datasets demonstrate that CoPMix achieves up to a 9.79% accuracy improvement over the existing alternatives, while reducing the pre-filling latency by up to 69.14% compared to full attention LLM alternative.} }
Endnote
%0 Conference Paper %T Enhancing Long-Context Inference with Context-Position Duo-Mixture %A Zhenyu Zhang %A Sharath Nittur Sridhar %A Zhangyang Wang %A Souvik Kundu %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2026 %E Rebekka Burkholz %E Shiwei Liu %E Saiprasad Ravishankar %E William Redman %E Wei Huang %E Weijie Su %E Zhihui Zhu %F pmlr-v328-zhang26c %I PMLR %P 1112--1124 %U https://proceedings.mlr.press/v328/zhang26c.html %V 328 %X Long-context understanding ability of the existing large language models (LLMs) is generally limited by their pre-training context window, providing limited effectiveness as context length increases. Moreover, even within the range of pre-training context length, LLMs often fail to capture vital information present in the middle of the context-window. Towards mitigating these limitations, we introduce context-position duo-mixture (CoPMix) of LLMs, a simple yet effective training-free method designed to enhance their long-context understanding performance in terms of both effectiveness as well as context awareness. Specifically, we present an input context chunking and mixing strategy that divides long sequences into multiple chunks, each accompanied by a shared context sink. The input query attends to all chunks in parallel, enabling the efficient integration of information across chunks. We then introduce an adaptive assignment of positional information to enhance the context awareness. This duo-mixture strategy reduces the quadratic complexity of attention to sub-quadratic while improving long-context processing performance. Extensive experiments across multiple LLMs on diverse long-context datasets demonstrate that CoPMix achieves up to a 9.79% accuracy improvement over the existing alternatives, while reducing the pre-filling latency by up to 69.14% compared to full attention LLM alternative.
APA
Zhang, Z., Sridhar, S.N., Wang, Z. & Kundu, S.. (2026). Enhancing Long-Context Inference with Context-Position Duo-Mixture. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:1112-1124 Available from https://proceedings.mlr.press/v328/zhang26c.html.

Related Material