LightThinker++:从推理压缩到记忆管理的演进
LightThinker++: From Reasoning Compression to Memory Management
April 4, 2026
作者: Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang
cs.AI
摘要
大型语言模型(LLMs)在复杂推理任务中表现出色,但其效率受限于冗长思维链带来的激增认知负荷。本文提出LightThinker方法,使LLMs能够将中间思维动态压缩为紧凑的语义表征。然而静态压缩在复杂推理中常面临挑战,因为中间细节的不可逆丢失可能导致逻辑瓶颈。为此,我们升级出LightThinker++框架,引入显式自适应记忆管理机制。该范式通过整合显式记忆原语转向行为级管理,并辅以专门设计的轨迹合成流程来训练目标明确的记忆调度策略。大量实验证明该框架具有三维度的卓越适应性:(1)LightThinker在精度损失最小化前提下,将峰值令牌使用量降低70%,推理时间缩短26%;(2)在标准推理任务中,LightThinker++在同等上下文预算下实现峰值令牌使用量减少69.9%,同时准确率提升2.42%;(3)尤其在长程智能体任务中,该框架在80轮对话后仍保持稳定内存占用(降低60%-70%),在不同复杂场景下平均性能提升达14.8%。本研究为持续实现深度LLM推理提供了可扩展路径,使其在长程任务中能以最小开销维持卓越性能。
English
Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compression often struggles with complex reasoning where the irreversible loss of intermediate details can lead to logical bottlenecks. To address this, we evolve the framework into LightThinker++, introducing Explicit Adaptive Memory Management. This paradigm shifts to behavioral-level management by incorporating explicit memory primitives, supported by a specialized trajectory synthesis pipeline to train purposeful memory scheduling. Extensive experiments demonstrate the framework's versatility across three dimensions. (1) LightThinker reduces peak token usage by 70% and inference time by 26% with minimal accuracy loss. (2) In standard reasoning, LightThinker++ slashes peak token usage by 69.9% while yielding a +2.42% accuracy gain under the same context budget for maximum performance. (3) Most notably, in long-horizon agentic tasks, it maintains a stable footprint beyond 80 rounds (a 60%-70% reduction), achieving an average performance gain of 14.8% across different complex scenarios. Overall, our work provides a scalable direction for sustaining deep LLM reasoning over extended horizons with minimal overhead.