ChatPaper.aiChatPaper

LADDER:通過遞歸問題分解實現大型語言模型的自我提升

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

March 2, 2025
作者: Toby Simonds, Akira Yoshiyama
cs.AI

摘要

我們介紹了LADDER(通過自主難度驅動的示例遞歸學習)框架,該框架使大型語言模型能夠通過自我引導學習,遞歸生成並解決複雜問題的逐步簡化變體,從而自主提升其問題解決能力。與以往需要精心策劃數據集或人類反饋的方法不同,LADDER利用模型自身的能力來生成更簡單的問題變體。我們在數學積分領域展示了LADDER的有效性,將Llama 3.2 3B在大學水平問題上的準確率從1%提升至82%,並使Qwen2.5 7B Deepseek-R1 Distilled在MIT積分蜜蜂資格考試中達到73%的準確率。此外,我們還引入了TTRL(測試時強化學習),在推理時對測試問題的變體進行強化學習。TTRL使Qwen2.5 7B Deepseek-R1 Distilled在MIT積分蜜蜂資格考試中取得了90%的頂尖成績,超越了OpenAI o1的表現。這些結果表明,自我導向的戰略學習能夠在不依賴架構擴展或人類監督的情況下,實現顯著的能力提升。
English
We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework which enables Large Language Models to autonomously improve their problem-solving capabilities through self-guided learning by recursively generating and solving progressively simpler variants of complex problems. Unlike prior approaches that require curated datasets or human feedback, LADDER leverages a model's own capabilities to generate easier question variants. We demonstrate LADDER's effectiveness in the subject of mathematical integration, improving Llama 3.2 3B's accuracy from 1% to 82% on undergraduate-level problems and enabling Qwen2.5 7B Deepseek-R1 Distilled to achieve 73% on the MIT Integration Bee qualifying examination. We also introduce TTRL (Test-Time Reinforcement Learning), where we perform reinforcement learning on variants of test problems at inference time. TTRL enables Qwen2.5 7B Deepseek-R1 Distilled to achieve a state-of-the-art score of 90% on the MIT Integration Bee qualifying examination, surpassing OpenAI o1's performance. These results show how self-directed strategic learning can achieve significant capability improvements without relying on architectural scaling or human supervision.

Summary

AI-Generated Summary

PDF212March 5, 2025