混合推理策略：教導大型語言模型運用適應性策略進行推理

摘要

大型語言模型（LLMs）通過如思維鏈（Chain-of-Thought, CoT）和思維樹（Tree-of-Thought, ToT）等先進提示技術，在複雜任務中表現卓越，但其依賴於手動設計、任務特定的提示，限制了適應性和效率。我們引入了混合推理（Mixture of Reasoning, MoR），這是一種訓練框架，將多樣化的推理策略嵌入LLMs中，實現自主、任務自適應的推理，無需外部提示工程。MoR包含兩個階段：思維生成，使用如GPT-4o等模型創建推理鏈模板；以及監督微調數據集構建，將模板與基準數據集配對進行監督微調。我們的實驗表明，MoR顯著提升了性能，MoR150在使用CoT提示時達到了0.730（提升2.2%），與基線相比達到了0.734（提升13.5%）。MoR消除了對任務特定提示的需求，提供了一種可泛化的解決方案，適用於多樣化任務的穩健推理。

English

Large language models (LLMs) excel in complex tasks through advanced prompting techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), but their reliance on manually crafted, task-specific prompts limits adaptability and efficiency. We introduce Mixture of Reasoning (MoR), a training framework that embeds diverse reasoning strategies into LLMs for autonomous, task-adaptive reasoning without external prompt engineering. MoR has two phases: Thought Generation, creating reasoning chain templates with models like GPT-4o, and SFT Dataset Construction, pairing templates with benchmark datasets for supervised fine-tuning.Our experiments show that MoR significantly enhances performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting and 0.734 (13.5% improvement) compared to baselines. MoR eliminates the need for task-specific prompts, offering a generalizable solution for robust reasoning across diverse tasks.

混合推理策略：教導大型語言模型運用適應性策略進行推理

Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

摘要

Support