ChatPaper.aiChatPaper

混合推理策略:教導大型語言模型運用適應性策略進行推理

Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

July 1, 2025
作者: Tao Xiong, Xavier Hu, Wenyan Fan, Shengyu Zhang
cs.AI

摘要

大型語言模型(LLMs)通過如思維鏈(Chain-of-Thought, CoT)和思維樹(Tree-of-Thought, ToT)等先進提示技術,在複雜任務中表現卓越,但其依賴於手動設計、任務特定的提示,限制了適應性和效率。我們引入了混合推理(Mixture of Reasoning, MoR),這是一種訓練框架,將多樣化的推理策略嵌入LLMs中,實現自主、任務自適應的推理,無需外部提示工程。MoR包含兩個階段:思維生成,使用如GPT-4o等模型創建推理鏈模板;以及監督微調數據集構建,將模板與基準數據集配對進行監督微調。我們的實驗表明,MoR顯著提升了性能,MoR150在使用CoT提示時達到了0.730(提升2.2%),與基線相比達到了0.734(提升13.5%)。MoR消除了對任務特定提示的需求,提供了一種可泛化的解決方案,適用於多樣化任務的穩健推理。
English
Large language models (LLMs) excel in complex tasks through advanced prompting techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), but their reliance on manually crafted, task-specific prompts limits adaptability and efficiency. We introduce Mixture of Reasoning (MoR), a training framework that embeds diverse reasoning strategies into LLMs for autonomous, task-adaptive reasoning without external prompt engineering. MoR has two phases: Thought Generation, creating reasoning chain templates with models like GPT-4o, and SFT Dataset Construction, pairing templates with benchmark datasets for supervised fine-tuning.Our experiments show that MoR significantly enhances performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting and 0.734 (13.5% improvement) compared to baselines. MoR eliminates the need for task-specific prompts, offering a generalizable solution for robust reasoning across diverse tasks.
PDF31July 2, 2025