在答案正确的长思维链训练轨迹中诊断有害延续

摘要

长链思维（CoT）追踪常被用作面向推理的大语言模型监督微调（SFT）的监督信号，然而，即便答案正确的追踪数据仍可能导致微调结果显著不同。我们研究了答案正确的长链CoT数据中的结论后延续现象：即答案已获得充分支持，但追踪数据仍包含额外的推理内容并被保留在监督目标中。为检验其训练效果，我们采用仅删除操作的编辑器，构建保留答案的后缀移除操作，并将原始追踪数据与处理后的追踪数据分别进行基于CoT的SFT对比。实验发现，移除编辑器识别的结论后延续后，SFT结果得到改善，表明这种延续在本文设定下对训练有害。因此，我们将这一经实证支持的现象称为“有害延续”。除干预分析外，我们还通过不确定性与隐状态进展对移除的结论后延续进行了表征，观察到局部不确定性持续存在，同时终端方向进展减弱，形成不确定性与几何特征的失配。最后，我们实现了“有害延续截断”（HCC）——一种轻量级的边界近似方法，可逼近编辑器识别的结论后延续边界。

English

Long chain-of-thought (CoT) traces are widely used as supervision for reasoning-oriented LLM SFT, yet answer-correct traces can still lead to markedly different fine-tuning outcomes. We study post-conclusion continuation in answer-correct long-CoT data: a continuation where the answer appears sufficiently supported, but the trace continues with additional reasoning that remains in the supervised target. To test its training effect, we use a delete-only editor to construct answer-preserving suffix removal and compare CoT-based SFT on the original and processed traces. We observe improved SFT outcomes after removing the editor-identified post-conclusion continuation, suggesting that this continuation is harmful to training in our setting. We therefore refer to this empirically supported phenomenon as harmful continuation. Beyond this intervention, we further characterize the removed post-conclusion continuation through uncertainty and hidden-state progress. We observe persistent local uncertainty together with weakened terminal-directional progress, forming an uncertainty--geometry mismatch. Finally, we instantiate Harmful Continuation Cut (HCC), a lightweight boundary proxy that approximates the editor-identified post-conclusion continuation boundary.