ChatPaper.aiChatPaper

《Free():纯内存分配推理模型中的遗忘学习机制》

Free(): Learning to Forget in Malloc-Only Reasoning Models

February 8, 2026
作者: Yilun Zheng, Dongyang Ma, Tian Liang, Jiahao Xu, Xinting Huang, Lihui Chen, Haitao Mi, Yan Wang
cs.AI

摘要

推理模型通过扩展测试时计算量来增强问题解决能力,但面临一个关键悖论:过量的思考标记往往会导致性能下降而非提升。我们将此归因于一个根本性的架构缺陷:标准大语言模型如同"仅分配内存"的引擎,持续积累有效与冗余的推理步骤,却缺乏剪除过时信息的机制。为突破这一局限,我们提出Free()LM模型,通过即插即用的LoRA适配器——自由模块,赋予模型内在的自遗忘能力。该模型在推理模式与清理模式间迭代切换,动态识别并剪除无效语境片段,始终保持紧凑无噪的推理状态。 大量实验表明,Free()LM在所有模型规模(8B至685B)上均实现持续提升。相比顶级推理基线模型平均提升3.3%,更凭借DeepSeek V3.2-Speciale在IMOanswerBench上创下新纪录。尤其值得注意的是,在标准Qwen3-235B-A22B模型完全失效(准确率0%)的长程推理任务中,Free()LM将性能恢复至50%。我们的研究结果表明:可持续的智能既需要思考的力量,也离不开遗忘的自由。
English
Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state. Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3% average improvement over top-tier reasoning baselines, even establishing a new SOTA on IMOanswerBench using DeepSeek V3.2-Speciale. Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0% accuracy), Free()LM restores performance to 50%. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.
PDF51February 13, 2026