LADDER: 재귀적 문제 분해를 통한 자기 개선형 대형 언어 모델

초록

우리는 LADDER(Learning through Autonomous Difficulty-Driven Example Recursion)를 소개합니다. 이 프레임워크는 대규모 언어 모델이 복잡한 문제의 점점 더 단순한 변형을 재귀적으로 생성하고 해결함으로써 자율적으로 문제 해결 능력을 향상시킬 수 있도록 합니다. 기존의 선별된 데이터셋이나 인간의 피드백이 필요한 접근 방식과 달리, LADDER는 모델 자체의 능력을 활용하여 더 쉬운 질문 변형을 생성합니다. 우리는 LADDER의 효과를 수학적 적분 분야에서 입증했는데, Llama 3.2 3B의 정확도를 학부 수준 문제에서 1%에서 82%로 향상시켰으며, Qwen2.5 7B Deepseek-R1 Distilled가 MIT Integration Bee 예선 시험에서 73%의 성적을 달성할 수 있도록 했습니다. 또한 우리는 TTRL(Test-Time Reinforcement Learning)을 소개합니다. 이는 추론 시간에 테스트 문제의 변형에 대해 강화 학습을 수행하는 방식입니다. TTRL을 통해 Qwen2.5 7B Deepseek-R1 Distilled는 MIT Integration Bee 예선 시험에서 90%라는 최첨단 성적을 달성하며 OpenAI o1의 성능을 능가했습니다. 이러한 결과는 아키텍처 확장이나 인간의 감독 없이도 자기 주도적 전략 학습이 상당한 능력 향상을 이룰 수 있음을 보여줍니다.

English

We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework which enables Large Language Models to autonomously improve their problem-solving capabilities through self-guided learning by recursively generating and solving progressively simpler variants of complex problems. Unlike prior approaches that require curated datasets or human feedback, LADDER leverages a model's own capabilities to generate easier question variants. We demonstrate LADDER's effectiveness in the subject of mathematical integration, improving Llama 3.2 3B's accuracy from 1% to 82% on undergraduate-level problems and enabling Qwen2.5 7B Deepseek-R1 Distilled to achieve 73% on the MIT Integration Bee qualifying examination. We also introduce TTRL (Test-Time Reinforcement Learning), where we perform reinforcement learning on variants of test problems at inference time. TTRL enables Qwen2.5 7B Deepseek-R1 Distilled to achieve a state-of-the-art score of 90% on the MIT Integration Bee qualifying examination, surpassing OpenAI o1's performance. These results show how self-directed strategic learning can achieve significant capability improvements without relying on architectural scaling or human supervision.

LADDER: 재귀적 문제 분해를 통한 자기 개선형 대형 언어 모델

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

초록

Support