診斷答案正確的長鏈式思考訓練軌跡中的有害延續

摘要

長鏈思維（CoT）軌跡廣泛用於監督導向推理的大型語言模型（LLM）微調（SFT），然而答案正確的軌跡仍可能導致顯著不同的微調結果。我們研究答案正確的長鏈思維資料中的「結論後續延續」：此類延續是指答案已獲得充分支持，但軌跡中仍繼續進行額外的推理，而這些推理內容仍保留在監督目標中。為檢驗其訓練效果，我們採用「僅刪除編輯器」構建保留答案的後綴移除操作，並比較原始軌跡與處理後軌跡的基於思維鏈的監督式微調結果。我們觀察到，移除編輯器識別出的結論後續延續後，監督式微調效果有所提升，表明在我們的情境下，此類延續對訓練有害。因此，我們將此實證支持的現象稱為「有害延續」。除介入處理外，我們進一步透過不確定性與隱藏狀態進程來表徵被移除的結論後續延續。我們觀察到持續的局部不確定性伴隨著減弱的終點方向進程，形成「不確定性—幾何結構不匹配」。最後，我們實例化「有害延續裁剪」（HCC），這是一個輕量的邊界代理，能夠近似編輯器識別出的結論後續延續邊界。

English

Long chain-of-thought (CoT) traces are widely used as supervision for reasoning-oriented LLM SFT, yet answer-correct traces can still lead to markedly different fine-tuning outcomes. We study post-conclusion continuation in answer-correct long-CoT data: a continuation where the answer appears sufficiently supported, but the trace continues with additional reasoning that remains in the supervised target. To test its training effect, we use a delete-only editor to construct answer-preserving suffix removal and compare CoT-based SFT on the original and processed traces. We observe improved SFT outcomes after removing the editor-identified post-conclusion continuation, suggesting that this continuation is harmful to training in our setting. We therefore refer to this empirically supported phenomenon as harmful continuation. Beyond this intervention, we further characterize the removed post-conclusion continuation through uncertainty and hidden-state progress. We observe persistent local uncertainty together with weakened terminal-directional progress, forming an uncertainty--geometry mismatch. Finally, we instantiate Harmful Continuation Cut (HCC), a lightweight boundary proxy that approximates the editor-identified post-conclusion continuation boundary.