ChatPaper.aiChatPaper

基于推理结构收敛的RLVR训练数据检测方法

Detecting RLVR Training Data via Structural Convergence of Reasoning

February 12, 2026
作者: Hongbo Zhang, Yue Yang, Jianhao Yan, Guangsheng Bao, Yue Zhang, Yue Zhang
cs.AI

摘要

可验证奖励的强化学习(RLVR)是训练现代推理模型的核心技术,但由于训练数据未公开,引发了基准测试污染的担忧。与基于词元级概率优化模型的预训练方法不同,RLVR通过自我生成的推理轨迹获得的奖励反馈进行模型微调,这使得传统的基于似然度的检测方法效力减弱。我们发现RLVR会引发独特的行为特征:在RLVR训练中接触过的提示会导致模型生成更刻板且相似的文本,而未见过的新提示则保持更高的多样性。我们提出Min-kNN距离检测法,这种简单的黑盒检测器通过采样单个提示的多个补全结果,并计算k个最小近邻编辑距离的平均值来量化这种坍缩现象。该方法无需参考原始模型或词元概率。在多个经过RLVR训练的推理模型上的实验表明,Min-kNN距离能可靠区分RL训练所见与未见样本,其性能优于现有的成员推理与RL污染检测基线方法。
English
Reinforcement learning with verifiable rewards (RLVR) is central to training modern reasoning models, but the undisclosed training data raises concerns about benchmark contamination. Unlike pretraining methods, which optimize models using token-level probabilities, RLVR fine-tunes models based on reward feedback from self-generated reasoning trajectories, making conventional likelihood-based detection methods less effective. We show that RLVR induces a distinctive behavioral signature: prompts encountered during RLVR training result in more rigid and similar generations, while unseen prompts retain greater diversity. We introduce Min-kNN Distance, a simple black-box detector that quantifies this collapse by sampling multiple completions for a given prompt and computing the average of the k smallest nearest-neighbor edit distances. Min-kNN Distance requires no access to the reference model or token probabilities. Experiments across multiple RLVR-trained reasoning models show that Min-kNN Distance reliably distinguishes RL-seen examples from unseen ones and outperforms existing membership inference and RL contamination detection baselines.
PDF11February 14, 2026