오르카: GPT-4의 복잡한 설명 트레이스로부터의 점진적 학습

초록

최근 연구는 대형 기반 모델(LFMs)이 생성한 출력을 활용하여 모방 학습을 통해 소규모 모델의 성능을 향상시키는 데 초점을 맞추고 있다. 이러한 모델의 품질에 영향을 미치는 여러 문제가 있는데, 이는 LFMs의 피상적인 출력으로부터 제한된 모방 신호, 소규모의 동질적인 학습 데이터, 그리고 가장 두드러지게는 엄격한 평가의 부재로 인해 소규모 모델의 능력을 과대평가하는 경향이 있다는 점이다. 이는 소규모 모델이 LFMs의 추론 과정이 아닌 스타일만을 모방하는 데 그치기 때문이다. 이러한 문제를 해결하기 위해, 우리는 LFMs의 추론 과정을 모방하는 130억 개의 파라미터를 가진 Orca 모델을 개발했다(LLaMA의 공개 정책에 따라 모델 가중치의 차이를 공개하기 위해 법무팀과 협력 중이며, 이는 https://aka.ms/orca-lm에서 공개될 예정이다). Orca는 ChatGPT의 교사 지원을 통해 설명 흔적, 단계별 사고 과정, 그리고 기타 복잡한 지시를 포함한 GPT-4의 풍부한 신호로부터 학습한다. 이러한 점진적 학습을 촉진하기 위해, 우리는 신중한 샘플링과 선택을 통해 대규모 및 다양한 모방 데이터를 활용한다. Orca는 Big-Bench Hard(BBH)와 같은 복잡한 제로샷 추론 벤치마크에서 Vicuna-13B와 같은 기존의 최첨단 지시 튜닝 모델을 100% 이상 능가하며, AGIEval에서는 42% 더 나은 성능을 보인다. 또한, Orca는 BBH 벤치마크에서 ChatGPT와 동등한 성능을 보이며, SAT, LSAT, GRE, GMAT와 같은 전문 및 학업 시험에서도 CoT 없이 제로샷 설정에서 경쟁력 있는 성능(최적화된 시스템 메시지와 4점 차이)을 보인다. 물론 GPT-4에는 미치지 못한다. 우리의 연구는 단계별 설명으로부터 학습하는 것이 모델의 능력과 기술을 향상시키는 유망한 방향임을 보여준다. 이러한 설명은 인간이 생성하든 더 발전된 AI 모델이 생성하든 상관없이 유효하다.

English

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.

오르카: GPT-4의 복잡한 설명 트레이스로부터의 점진적 학습

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

초록

Support