추론의 혼합: 대규모 언어 모델이 적응형 전략으로 추론하도록 가르치기

초록

대형 언어 모델(LLM)은 Chain-of-Thought(CoT) 및 Tree-of-Thought(ToT)와 같은 고급 프롬프팅 기술을 통해 복잡한 작업에서 뛰어난 성능을 보이지만, 수동으로 설계된 작업별 프롬프트에 의존하기 때문에 적응성과 효율성이 제한됩니다. 우리는 다양한 추론 전략을 LLM에 내장하여 외부 프롬프트 엔지니어링 없이도 자율적이고 작업에 적응 가능한 추론을 가능하게 하는 Mixture of Reasoning(MoR) 훈련 프레임워크를 소개합니다. MoR은 두 단계로 구성됩니다: Thought Generation 단계에서는 GPT-4o와 같은 모델을 사용하여 추론 체인 템플릿을 생성하고, SFT Dataset Construction 단계에서는 이러한 템플릿을 벤치마크 데이터셋과 짝지어 지도 미세 조정을 수행합니다. 실험 결과, MoR은 성능을 크게 향상시키며, MoR150은 CoT 프롬프팅을 사용하여 0.730(2.2% 개선)을 달성하고, 기준 모델 대비 0.734(13.5% 개선)을 보였습니다. MoR은 작업별 프롬프트의 필요성을 제거함으로써 다양한 작업에 걸쳐 강력한 추론을 위한 일반화 가능한 솔루션을 제공합니다.

English

Large language models (LLMs) excel in complex tasks through advanced prompting techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), but their reliance on manually crafted, task-specific prompts limits adaptability and efficiency. We introduce Mixture of Reasoning (MoR), a training framework that embeds diverse reasoning strategies into LLMs for autonomous, task-adaptive reasoning without external prompt engineering. MoR has two phases: Thought Generation, creating reasoning chain templates with models like GPT-4o, and SFT Dataset Construction, pairing templates with benchmark datasets for supervised fine-tuning.Our experiments show that MoR significantly enhances performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting and 0.734 (13.5% improvement) compared to baselines. MoR eliminates the need for task-specific prompts, offering a generalizable solution for robust reasoning across diverse tasks.

추론의 혼합: 대규모 언어 모델이 적응형 전략으로 추론하도록 가르치기

Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

초록

Support