Genius: 고급 추론을 위한 일반화 가능한 순수 비지도 자기 학습 프레임워크

초록

LLM(대형 언어 모델)의 추론 능력 향상은 광범위한 관심을 끌고 있습니다. 그러나 현재의 사후 훈련 기법은 결과 감독 또는 보조 보상 모델과 같은 감독 신호에 크게 의존하고 있어, 확장성 문제와 높은 주석 비용이라는 한계에 직면해 있습니다. 이는 외부 감독 없이도 LLM의 추론 능력을 강화할 필요성을 제기합니다. 우리는 이를 위해 일반화 가능하고 순수하게 비지도 학습 방식의 자기 훈련 프레임워크인 Genius를 소개합니다. Genius는 외부 보조 없이도 단계별로 최적의 응답 시퀀스를 탐색하고 LLM을 최적화합니다. 잠재적 단계를 탐색하고 최적의 단계를 활용하기 위해, Genius는 단계별 전망 재샘플링 전략을 도입하여 미래 결과를 시뮬레이션함으로써 단계 값을 샘플링하고 추정합니다. 또한, 비지도 학습 설정이 필연적으로 내재적 노이즈와 불확실성을 유발한다는 점을 인식하고, 이를 해결하기 위해 추정 불일치를 완화하는 이점 보정 최적화(ACO) 손실 함수를 제안합니다. 이러한 기술들을 결합함으로써, Genius는 일반적인 질의와 감독 없이도 LLM의 추론 능력을 자기 개선하는 데 있어 초기 단계를 제공하며, 일반 질의의 방대한 가용성을 고려할 때 추론 스케일링 법칙에 혁신을 가져옵니다. 코드는 https://github.com/xufangzhi/Genius에서 공개될 예정입니다.

English

Advancing LLM reasoning skills has captivated wide interest. However, current post-training techniques rely heavily on supervisory signals, such as outcome supervision or auxiliary reward models, which face the problem of scalability and high annotation costs. This motivates us to enhance LLM reasoning without the need for external supervision. We introduce a generalizable and purely unsupervised self-training framework, named Genius. Without external auxiliary, Genius requires to seek the optimal response sequence in a stepwise manner and optimize the LLM. To explore the potential steps and exploit the optimal ones, Genius introduces a stepwise foresight re-sampling strategy to sample and estimate the step value by simulating future outcomes. Further, we recognize that the unsupervised setting inevitably induces the intrinsic noise and uncertainty. To provide a robust optimization, we propose an advantage-calibrated optimization (ACO) loss function to mitigate estimation inconsistencies. Combining these techniques together, Genius provides an advanced initial step towards self-improve LLM reasoning with general queries and without supervision, revolutionizing reasoning scaling laws given the vast availability of general queries. The code will be released at https://github.com/xufangzhi/Genius.

Genius: 고급 추론을 위한 일반화 가능한 순수 비지도 자기 학습 프레임워크

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

초록

Support