LADDER：通過遞歸問題分解實現大型語言模型的自我提升

摘要

我們介紹了LADDER（通過自主難度驅動的示例遞歸學習）框架，該框架使大型語言模型能夠通過自我引導學習，遞歸生成並解決複雜問題的逐步簡化變體，從而自主提升其問題解決能力。與以往需要精心策劃數據集或人類反饋的方法不同，LADDER利用模型自身的能力來生成更簡單的問題變體。我們在數學積分領域展示了LADDER的有效性，將Llama 3.2 3B在大學水平問題上的準確率從1%提升至82%，並使Qwen2.5 7B Deepseek-R1 Distilled在MIT積分蜜蜂資格考試中達到73%的準確率。此外，我們還引入了TTRL（測試時強化學習），在推理時對測試問題的變體進行強化學習。TTRL使Qwen2.5 7B Deepseek-R1 Distilled在MIT積分蜜蜂資格考試中取得了90%的頂尖成績，超越了OpenAI o1的表現。這些結果表明，自我導向的戰略學習能夠在不依賴架構擴展或人類監督的情況下，實現顯著的能力提升。

English

We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework which enables Large Language Models to autonomously improve their problem-solving capabilities through self-guided learning by recursively generating and solving progressively simpler variants of complex problems. Unlike prior approaches that require curated datasets or human feedback, LADDER leverages a model's own capabilities to generate easier question variants. We demonstrate LADDER's effectiveness in the subject of mathematical integration, improving Llama 3.2 3B's accuracy from 1% to 82% on undergraduate-level problems and enabling Qwen2.5 7B Deepseek-R1 Distilled to achieve 73% on the MIT Integration Bee qualifying examination. We also introduce TTRL (Test-Time Reinforcement Learning), where we perform reinforcement learning on variants of test problems at inference time. TTRL enables Qwen2.5 7B Deepseek-R1 Distilled to achieve a state-of-the-art score of 90% on the MIT Integration Bee qualifying examination, surpassing OpenAI o1's performance. These results show how self-directed strategic learning can achieve significant capability improvements without relying on architectural scaling or human supervision.

LADDER：通過遞歸問題分解實現大型語言模型的自我提升

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

摘要

Support