ChatPaper.aiChatPaper

当推理遇见法则

When Reasoning Meets Its Laws

December 19, 2025
作者: Junyu Zhang, Yifan Sun, Tianang Leng, Jingyan Shen, Liu Ziyin, Paul Pu Liang, Huan Zhang
cs.AI

摘要

尽管大型推理模型(LRMs)展现出卓越性能,但其推理行为常违背直觉,导致推理能力未达最优。为从理论层面形式化描述理想推理行为,本文提出推理定律(Laws of Reasoning, LoRe)——一个刻画LRMs内在推理模式的统一框架。我们首先提出计算定律,其核心假设是推理计算量应与问题复杂度呈线性增长。除计算量外,我们通过补充的准确率定律扩展LoRe框架。由于问题复杂度难以实际量化,我们通过定律的两个特性——单调性与组合性——来验证这些假设。据此推出LoRe-Bench基准测试,系统化衡量大型推理模型在这两个可量化特性上的表现。评估表明,大多数推理模型具备合理单调性但缺乏组合性。为此,我们开发了一种强制实现计算定律组合性的高效微调方法。大量实证研究表明,更好遵循计算定律能在多个基准测试中持续提升推理性能,并揭示特性与定律间的协同效应。项目页面:https://lore-project.github.io/
English
Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabilities. To theoretically formalize the desired reasoning behaviors, this paper presents the Laws of Reasoning (LoRe), a unified framework that characterizes intrinsic reasoning patterns in LRMs. We first propose compute law with the hypothesis that the reasoning compute should scale linearly with question complexity. Beyond compute, we extend LoRe with a supplementary accuracy law. Since the question complexity is difficult to quantify in practice, we examine these hypotheses by two properties of the laws, monotonicity and compositionality. We therefore introduce LoRe-Bench, a benchmark that systematically measures these two tractable properties for large reasoning models. Evaluation shows that most reasoning models exhibit reasonable monotonicity but lack compositionality. In response, we develop an effective finetuning approach that enforces compute-law compositionality. Extensive empirical studies demonstrate that better compliance with compute laws yields consistently improved reasoning performance on multiple benchmarks, and uncovers synergistic effects across properties and laws. Project page: https://lore-project.github.io/
PDF483December 23, 2025