雕塑家:通过主动上下文管理赋予大语言模型认知能动性
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
August 6, 2025
作者: Mo Li, L. H. Xu, Qitai Tan, Ting Cao, Yunxin Liu
cs.AI
摘要
大型语言模型(LLMs)在处理长上下文时,由于前摄干扰的存在,性能显著下降,即上下文前部的不相关信息干扰了推理和记忆提取。尽管大多数研究集中于通过外部记忆系统增强LLMs的能力,我们提出了一种补充性策略:赋予LLMs主动上下文管理(ACM)工具,以主动塑造其内部工作记忆。我们介绍了Sculptor框架,该框架为LLMs配备了三大类工具:(1)上下文分割,(2)摘要、隐藏与恢复,以及(3)智能搜索。我们的方法使LLMs能够主动管理其注意力与工作记忆,类似于人类如何选择性聚焦于相关信息而过滤掉干扰。在信息稀疏的基准测试——PI-LLM(前摄干扰)和NeedleBench多针推理上的实验评估表明,即使未经专门训练,Sculptor也能显著提升性能,这得益于LLMs固有的工具调用泛化能力。通过实现主动上下文管理,Sculptor不仅缓解了前摄干扰,还为跨多样长上下文任务的更可靠推理提供了认知基础——强调明确的上下文控制策略,而非单纯扩大令牌窗口,是实现大规模稳健性的关键。
English
Large Language Models (LLMs) suffer from significant performance degradation
when processing long contexts due to proactive interference, where irrelevant
information in earlier parts of the context disrupts reasoning and memory
recall. While most research focuses on external memory systems to augment LLMs'
capabilities, we propose a complementary approach: empowering LLMs with Active
Context Management (ACM) tools to actively sculpt their internal working
memory. We introduce Sculptor, a framework that equips LLMs with three
categories of tools: (1) context fragmentation, (2) summary, hide, and restore,
and (3) intelligent search. Our approach enables LLMs to proactively manage
their attention and working memory, analogous to how humans selectively focus
on relevant information while filtering out distractions. Experimental
evaluation on information-sparse benchmarks-PI-LLM (proactive interference) and
NeedleBench Multi-Needle Reasoning-demonstrates that Sculptor significantly
improves performance even without specific training, leveraging LLMs' inherent
tool calling generalization capabilities. By enabling Active Context
Management, Sculptor not only mitigates proactive interference but also
provides a cognitive foundation for more reliable reasoning across diverse
long-context tasks-highlighting that explicit context-control strategies,
rather than merely larger token windows, are key to robustness at scale.