混合推理策略:教导大型语言模型运用自适应策略进行推理
Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies
July 1, 2025
作者: Tao Xiong, Xavier Hu, Wenyan Fan, Shengyu Zhang
cs.AI
摘要
大型语言模型(LLMs)通过诸如思维链(CoT)和思维树(ToT)等先进的提示技术,在复杂任务中表现出色,但其对人工定制、任务特定提示的依赖限制了适应性和效率。我们引入了混合推理(MoR),这是一种训练框架,将多样化的推理策略嵌入LLMs中,实现自主、任务自适应的推理,无需外部提示工程。MoR包含两个阶段:思维生成,利用如GPT-4o等模型创建推理链模板;以及监督微调数据集构建,将模板与基准数据集配对进行监督微调。实验表明,MoR显著提升了性能,其中MoR150在使用CoT提示时达到0.730(提升2.2%),与基线相比达到0.734(提升13.5%)。MoR消除了对任务特定提示的需求,为跨多样任务的稳健推理提供了一个可推广的解决方案。
English
Large language models (LLMs) excel in complex tasks through advanced
prompting techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), but
their reliance on manually crafted, task-specific prompts limits adaptability
and efficiency. We introduce Mixture of Reasoning (MoR), a training framework
that embeds diverse reasoning strategies into LLMs for autonomous,
task-adaptive reasoning without external prompt engineering. MoR has two
phases: Thought Generation, creating reasoning chain templates with models like
GPT-4o, and SFT Dataset Construction, pairing templates with benchmark datasets
for supervised fine-tuning.Our experiments show that MoR significantly enhances
performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting
and 0.734 (13.5% improvement) compared to baselines. MoR eliminates the need
for task-specific prompts, offering a generalizable solution for robust
reasoning across diverse tasks.