자기 주도 언어 모델

초록

테스트 시간 추론은 언어 모델이 복잡한 작업을 처리할 수 있게 해주지만, 자연어로 검색하거나 계획을 세우는 과정은 느리고 비용이 많이 들며 오류가 발생하기 쉽습니다. 그러나 언어 모델이 문제를 해결하기 위해 필요한 정확한 추론 단계를 모방하는 데 어려움을 겪더라도, 종종 문제의 추상적 구조를 설명하는 데는 뛰어난 능력을 보입니다. 이는 해결책을 검증하는 방법과 이를 탐색하는 방법 모두를 포함합니다. 본 논문은 "자기 주도(self-steering)" 언어 모델을 위한 DisCIPL 방법을 소개합니다. 이 방법에서는 Planner 모델이 작업 특화 추론 프로그램을 생성하고, 이를 Follower 모델 집단이 실행합니다. 우리의 접근 방식은 언어 모델에게 재귀적 탐색 절차를 작성하여 언어 모델 추론을 안내할 수 있는 능력을 부여함으로써, 검증 가능하고 효율적인 새로운 형태의 추론을 가능하게 합니다. 작은 규모의 Follower 모델(예: Llama-3.2-1B)을 사용하여 DisCIPL을 구현했을 때, 도전적인 제약 생성 작업에서 GPT-4o 및 o1과 같은 훨씬 더 큰 모델과 동등한 성능을 보이거나 때로는 이를 능가하는 결과를 보였습니다. 계획과 실행을 분리함으로써, 우리의 연구는 고도로 병렬화된 몬테카를로 추론 전략의 설계 공간을 열어, 표준 best-of-N 샘플링을 능가하고, 파인튜닝이 필요 없으며, 기존 언어 모델에 의해 자동으로 구현될 수 있는 새로운 가능성을 제시합니다.

English

While test-time reasoning enables language models to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure--both how to verify solutions and how to search for them. This paper introduces DisCIPL, a method for "self-steering" LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models. Our approach equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning. When instantiated with a small Follower (e.g., Llama-3.2-1B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1, on challenging constrained generation tasks. In decoupling planning from execution, our work opens up a design space of highly-parallelized Monte Carlo inference strategies that outperform standard best-of-N sampling, require no finetuning, and can be implemented automatically by existing LMs.

자기 주도 언어 모델

Self-Steering Language Models

초록

Support