推理的幻象：透過零思維鏈截斷揭露LLM中的隱蔽數據污染

摘要

大型語言模型（LLMs）在廣泛的任務中展現出令人驚嘆的推理能力，但資料污染卻削弱了對這些能力的客觀評估。惡意的模型發布者進一步加劇了此問題，他們採用規避性或間接的污染策略（例如對基準測試資料進行改寫）來躲避現有檢測方法，並人為地提升排行榜表現。現有方法難以可靠地偵測這類隱蔽性的污染。在本研究中，我們揭示了一個關鍵現象：模型生成的推理步驟會主動掩蓋其潛在的記憶化行為。受此啟發，我們提出了「零思維鏈探測器」（Zero-CoT Probe, ZCP），這是一種新穎的黑箱檢測方法，刻意截斷整個思維鏈（Chain-of-Thought, CoT）過程，以暴露潛在的捷徑對應關係。為進一步將記憶化行為與模型內在的問題解決能力分離，ZCP比較了模型在原始基準測試上的零思維鏈表現與在經過同構擾動的參考資料集上的表現。此外，我們引入了「污染可信度」（Contamination Confidence）這一指標，用以量化污染的機率與嚴重程度，超越了單純的二元分類。在先前已被識別為受污染的模型以及經過特別微調的受污染模型上所進行的廣泛實驗表明，ZCP能夠穩健地檢測出直接與規避性的資料污染。ZCP的程式碼可於 https://github.com/Yifan-Lan/zero-cot-probe 取得。

English

Large language models (LLMs) have demonstrated impressive reasoning abilities across a wide range of tasks, but data contamination undermines the objective evaluation of these capabilities. This problem is further exacerbated by malicious model publishers who use evasive, or indirect, contamination strategies, such as paraphrasing benchmark data to evade existing detection methods and artificially boost leaderboard performance. Current approaches struggle to reliably detect such stealthy contamination. In this work, we uncover a critical phenomenon: a model's generated reasoning steps actively mask its underlying memorization. Inspired by this, we propose the Zero-CoT Probe (ZCP), a novel black-box detection method that deliberately truncates the entire Chain-of-Thought (CoT) process to expose latent shortcut mappings. To further isolate memorization from the model's intrinsic problem-solving capabilities, ZCP compares the model's zero-CoT performance on the original benchmark against an isomorphically perturbed reference dataset. Furthermore, we introduce Contamination Confidence, a metric that quantifies both the likelihood and severity of contamination, moving beyond simple binary classifications. Extensive experiments on both previously identified contaminated models and specially fine-tuned contaminated models demonstrate that ZCP robustly detects both direct and evasive data contamination. The code for ZCP is accessible at https://github.com/Yifan-Lan/zero-cot-probe.