LightThinker++: 추론 압축에서 메모리 관리까지

초록

대규모 언어 모델(LLM)은 복잡한 추론 능력이 뛰어나지만, 긴 사고 흔적의 급증하는 인지 부하로 인해 효율성이 제한됩니다. 본 논문에서는 LLM이 중간 사고를 동적으로 압축된 의미 표현으로 압축할 수 있는 LightThinker 방법을 제안합니다. 그러나 정적 압축은 중간 세부 사항의 비가역적 손실이 논리적 병목 현상을 초래할 수 있는 복잡한 추론에서는 종종 어려움을 겪습니다. 이를 해결하기 위해 우리는 프레임워크를 LightThinker++로 발전시켜 명시적 적응형 메모리 관리(Explicit Adaptive Memory Management)를 도입합니다. 이 패러다임은 명시적 메모리 기본 요소를 통합하여 행동 수준 관리로 전환하며, 목적 지향적 메모리 스케줄링을 훈련시키기 위한 특화된 궤적 합성 파이프라인으로 지원됩니다. 광범위한 실험을 통해 이 프레임워크의 세 가지 차원에서의 다양성을 입증합니다. (1) LightThinker는 최소한의 정확도 손실로 최대 토큰 사용량을 70%, 추론 시간을 26% 감소시킵니다. (2) 표준 추론에서 LightThinker++는 최대 성능을 위한 동일한 컨텍스트 예산 하에서 최대 토큰 사용량을 69.9% 절감하면서 +2.42%의 정확도 향상을 달성합니다. (3) 가장 주목할 만하게는, 장기 에이전트 작업에서 80라운드 이상 안정적인 메모리 사용량(60%-70% 감소)을 유지하며 다양한 복잡한 시나리오에서 평균 14.8%의 성능 향상을 달성합니다. 전반적으로, 우리의 연구는 최소의 오버헤드로 확장된 지평에 걸쳐 심층 LLM 추론을 지속하기 위한 확장 가능한 방향을 제시합니다.

English

Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compression often struggles with complex reasoning where the irreversible loss of intermediate details can lead to logical bottlenecks. To address this, we evolve the framework into LightThinker++, introducing Explicit Adaptive Memory Management. This paradigm shifts to behavioral-level management by incorporating explicit memory primitives, supported by a specialized trajectory synthesis pipeline to train purposeful memory scheduling. Extensive experiments demonstrate the framework's versatility across three dimensions. (1) LightThinker reduces peak token usage by 70% and inference time by 26% with minimal accuracy loss. (2) In standard reasoning, LightThinker++ slashes peak token usage by 69.9% while yielding a +2.42% accuracy gain under the same context budget for maximum performance. (3) Most notably, in long-horizon agentic tasks, it maintains a stable footprint beyond 80 rounds (a 60%-70% reduction), achieving an average performance gain of 14.8% across different complex scenarios. Overall, our work provides a scalable direction for sustaining deep LLM reasoning over extended horizons with minimal overhead.

LightThinker++: 추론 압축에서 메모리 관리까지

LightThinker++: From Reasoning Compression to Memory Management

초록

Support