基本推理范式诱导语言模型实现领域外泛化
Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models
February 9, 2026
作者: Mingzi Cao, Xingwei Tan, Mahmud Akhter, Marco Valentino, Maria Liakata, Xi Wang, Nikolaos Aletras
cs.AI
摘要
演绎、归纳与溯因作为人类逻辑思维的核心基础,是三种根本性的推理范式。尽管提升大型语言模型(LLM)的推理能力已吸引大量研究关注,但这些基础范式能在多大程度上引发泛化能力仍有待系统探索。本研究旨在揭示这三种核心范式的相互作用如何影响LLMs的推理行为。我们首先从符号化任务中收集新型推理轨迹数据集,每项任务专门针对一种基础范式以剥离具体世界知识的影响;随后探究将这些推理能力注入LLMs的有效途径。我们实验了包括简单微调、增加模型深度的复杂方法,以及将稠密模型转换为专家混合模型等多种策略。通过在完全采用自然语言表述且包含真实世界知识的现实跨领域任务上进行全面评估,结果表明我们的方法能产生强大的泛化能力,在现实任务中实现显著性能提升(最高达14.60分)。
English
Deduction, induction, and abduction are fundamental reasoning paradigms, core for human logical thinking. Although improving Large Language Model (LLM) reasoning has attracted significant research efforts, the extent to which the fundamental paradigms induce generalization has yet to be systematically explored. In this study, we shed light on how the interplay between these core paradigms influences LLMs' reasoning behavior. To this end, we first collect a new dataset of reasoning trajectories from symbolic tasks, each targeting one of the three fundamental paradigms, to abstract from concrete world knowledge. Then, we investigate effective ways for inducing these skills into LLMs. We experiment with a battery of methods including simple fine-tuning, and more complex approaches to increase model depth, or transform a dense model to a mixture-of-experts. We comprehensively evaluate induced models on realistic out-of-domain tasks, that are entirely formulated in natural language and contain real-world knowledge. Our results reveal that our approach yields strong generalizability with substantial performance gains (up to 14.60) across realistic tasks.