ChatPaper.aiChatPaper

通过推理结构收敛检测RLVR训练数据

Detecting RLVR Training Data via Structural Convergence of Reasoning

February 12, 2026
作者: Hongbo Zhang, Yue Yang, Jianhao Yan, Guangsheng Bao, Yue Zhang, Yue Zhang
cs.AI

摘要

基于可验证奖励的强化学习(RLVR)是训练现代推理模型的核心技术,但由于训练数据未公开,引发了关于基准测试污染的担忧。与使用词元级概率优化模型的预训练方法不同,RLVR根据自生成推理轨迹的奖励反馈对模型进行微调,这使得传统的基于似然度的检测方法效果有限。我们发现RLVR会引发独特的行为特征:在RLVR训练中接触过的提示会导致生成结果更趋僵化且相似,而未见过的新提示则保持更高的多样性。我们提出Min-kNN距离检测法——一种简单的黑盒检测器,通过为给定提示采样多个补全结果,并计算k个最小最近邻编辑距离的平均值来量化这种坍缩现象。该方法无需参考模型参数或词元概率即可实现检测。在多款RLVR训练的推理模型上的实验表明,Min-kNN距离能可靠区分RL训练所见示例与未见示例,其性能优于现有的成员推断与RL污染检测基线方法。
English
Reinforcement learning with verifiable rewards (RLVR) is central to training modern reasoning models, but the undisclosed training data raises concerns about benchmark contamination. Unlike pretraining methods, which optimize models using token-level probabilities, RLVR fine-tunes models based on reward feedback from self-generated reasoning trajectories, making conventional likelihood-based detection methods less effective. We show that RLVR induces a distinctive behavioral signature: prompts encountered during RLVR training result in more rigid and similar generations, while unseen prompts retain greater diversity. We introduce Min-kNN Distance, a simple black-box detector that quantifies this collapse by sampling multiple completions for a given prompt and computing the average of the k smallest nearest-neighbor edit distances. Min-kNN Distance requires no access to the reference model or token probabilities. Experiments across multiple RLVR-trained reasoning models show that Min-kNN Distance reliably distinguishes RL-seen examples from unseen ones and outperforms existing membership inference and RL contamination detection baselines.
PDF11February 14, 2026