超频LLM推理：监控与调控大语言模型中的思维路径长度

摘要

近期，显式结构化推理等技术通过强制分离模型的内部“思考”过程与最终响应，展现了强大的测试时扩展能力。在此情境下，影响答案质量的一个关键因素是思考阶段的长度。当推理过短时，模型可能无法捕捉任务的复杂性；反之，若推理过长，模型则可能过度思考，导致不必要的计算并降低性能。本文深入探讨并利用了大型语言模型（LLMs）在显式思维过程中理解和调控其推理长度的内在机制。首先，我们揭示了LLMs如何编码其推理进程，并引入了一种交互式进度条可视化工具，用以揭示模型规划动态的洞见。其次，我们在推理过程中操控内部进度编码，以减少冗余步骤，生成更为简洁且果断的思维链。实证结果表明，这种“超频”方法有效缓解了过度思考，提升了答案准确性，并降低了推理延迟。我们的代码已公开提供。

English

Recently, techniques such as explicit structured reasoning have demonstrated strong test-time scaling behavior by enforcing a separation between the model's internal "thinking" process and the final response. A key factor influencing answer quality in this setting is the length of the thinking stage. When the reasoning is too short, the model may fail to capture the complexity of the task. Conversely, when it is too long, the model may overthink, leading to unnecessary computation and degraded performance. This paper explores and exploits the underlying mechanisms by which LLMs understand and regulate the length of their reasoning during explicit thought processes. First, we show that LLMs encode their progress through the reasoning process and introduce an interactive progress bar visualization, which is then used to reveal insights on the model's planning dynamics. Second, we manipulate the internal progress encoding during inference to reduce unnecessary steps and generate a more concise and decisive chain of thoughts. Our empirical results demonstrate that this "overclocking" method mitigates overthinking, improves answer accuracy, and reduces inference latency. Our code is publicly available.

超频LLM推理：监控与调控大语言模型中的思维路径长度

Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

摘要

Support