Free(): メモリ確保のみの推論モデルにおける忘却学習手法

要旨

推論モデルはテスト時の計算リソースを拡張することで問題解決能力を向上させるが、深刻なパラドックスに直面している：過剰な思考トークンが性能向上ではなくむしろ劣化を招くのである。我々はこれを根本的なアーキテクチャの欠陥と考える：標準的なLLMは「malloc専用」エンジンとして動作し、有効なステップと冗長なステップを区別なく継続的に蓄積し、陳腐化した情報を剪定するメカニズムを欠いている。この循環を打破するため、我々はFree()LMを提案する。これはFree-Module（プラグアンドプレイのLoRAアダプタ）を通じて内在的な自己忘却能力を導入するモデルである。推論モードとクリーニングモードを反復的に切り替えることで、Free()LMは有用でない文脈チャンクを動的に特定・剪定し、コンパクトでノイズのない状態を維持する。大規模な実験により、Free()LMが全てのモデル規模（8Bから685B）で一貫した改善をもたらすことが示された。これは最高水準の推論ベースラインを平均3.3%上回り、DeepSeek V3.2-Specialeを用いたIMOanswerBenchでは新たなSOTAを確立した。特に注目すべきは、標準的なQwen3-235B-A22Bモデルが完全に崩壊（0%精度）する長期的タスクにおいて、Free()LMが性能を50%に回復させた点である。我々の発見は、持続可能な知能には思考する力と同様に忘却する自由が必要であることを示唆している。

English

Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state. Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3% average improvement over top-tier reasoning baselines, even establishing a new SOTA on IMOanswerBench using DeepSeek V3.2-Speciale. Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0% accuracy), Free()LM restores performance to 50%. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.

Free(): メモリ確保のみの推論モデルにおける忘却学習手法

Free(): Learning to Forget in Malloc-Only Reasoning Models

要旨

Support