混合推理策略：教导大型语言模型运用自适应策略进行推理

摘要

大型语言模型（LLMs）通过诸如思维链（CoT）和思维树（ToT）等先进的提示技术，在复杂任务中表现出色，但其对人工定制、任务特定提示的依赖限制了适应性和效率。我们引入了混合推理（MoR），这是一种训练框架，将多样化的推理策略嵌入LLMs中，实现自主、任务自适应的推理，无需外部提示工程。MoR包含两个阶段：思维生成，利用如GPT-4o等模型创建推理链模板；以及监督微调数据集构建，将模板与基准数据集配对进行监督微调。实验表明，MoR显著提升了性能，其中MoR150在使用CoT提示时达到0.730（提升2.2%），与基线相比达到0.734（提升13.5%）。MoR消除了对任务特定提示的需求，为跨多样任务的稳健推理提供了一个可推广的解决方案。

English

Large language models (LLMs) excel in complex tasks through advanced prompting techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), but their reliance on manually crafted, task-specific prompts limits adaptability and efficiency. We introduce Mixture of Reasoning (MoR), a training framework that embeds diverse reasoning strategies into LLMs for autonomous, task-adaptive reasoning without external prompt engineering. MoR has two phases: Thought Generation, creating reasoning chain templates with models like GPT-4o, and SFT Dataset Construction, pairing templates with benchmark datasets for supervised fine-tuning.Our experiments show that MoR significantly enhances performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting and 0.734 (13.5% improvement) compared to baselines. MoR eliminates the need for task-specific prompts, offering a generalizable solution for robust reasoning across diverse tasks.

混合推理策略：教导大型语言模型运用自适应策略进行推理

Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

摘要

Support