通过潜在导向向量进行分数推理优化了推理时间计算

摘要

测试时计算已成为提升大型语言模型（LLMs）性能的强大范式，其中生成多个输出或精炼单个推理链能显著提高答案准确性。然而，现有方法如最佳N选一、多数投票及自我反思通常对输入采用统一的推理方式，忽视了不同问题可能需要不同深度的推理。在本研究中，我们提出了分数推理（Fractional Reasoning），这是一种无需训练且与模型无关的框架，能够在推理时实现对推理强度的连续控制，突破了固定指令提示的限制。该方法通过提取与深度推理相关的潜在导向向量，并以可调比例因子重新应用，使模型能够根据每个输入的复杂性定制其推理过程。这支持了两种关键的测试时扩展模式：(1) 在基于广度的策略（如最佳N选一、多数投票）中提升输出质量，(2) 在基于深度的策略（如自我反思）中增强单个推理链的正确性。在GSM8K、MATH500和GPQA上的实验表明，分数推理在多种推理任务和模型中均能持续提升性能。

English

Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of each input. This supports two key modes of test-time scaling: (1) improving output quality in breadth-based strategies (e.g., Best-of-N, majority voting), and (2) enhancing the correctness of individual reasoning chains in depth-based strategies (e.g., self-reflection). Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.

通过潜在导向向量进行分数推理优化了推理时间计算

Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

摘要

Support