Abstain-R1:基于可验证强化学习的校准弃权与拒绝后澄清机制
Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL
April 18, 2026
作者: Skylar Zhai, Jingcheng Liang, Dongyeop Kang
cs.AI
摘要
强化微调能提升大语言模型的推理能力,但也会促使其通过猜测或虚构缺失信息来回答不可答问题。现有弃答方法要么训练模型生成通用拒绝回复,要么鼓励其寻求后续澄清却未验证这些澄清是否真正定位到关键缺失信息。我们研究那些语义明确但基于给定信息无法可靠解答的查询,主张可靠模型不仅应弃答,还需解释缺失信息。我们提出一种澄清感知的RLVR奖励机制,在奖励可答问题正确答案的同时,联合优化对不可答问题的显式弃答和语义对齐的拒绝后澄清。基于该奖励机制,我们训练出30亿参数的Abstain-R1模型,该模型在保持可答问题强性能的同时,显著提升了对不可答问题的弃答与澄清能力。在Abstain-Test、Abstain-QA和SelfAware数据集上的实验表明,Abstain-R1较其基础模型实现显著提升,其不可答问题处理能力与包括DeepSeek-R1在内的更大规模系统相当,这证明经过校准的弃答与澄清能力可通过可验证奖励习得,而非仅依赖模型规模扩张。
English
Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by guessing or hallucinating missing information. Existing abstention methods either train models to produce generic refusals or encourage follow-up clarifications without verifying whether those clarifications identify the key missing information. We study queries that are clear in meaning but cannot be reliably resolved from the given information, and argue that a reliable model should not only abstain, but also explain what is missing. We propose a clarification-aware RLVR reward that, while rewarding correct answers on answerable queries, jointly optimizes explicit abstention and semantically aligned post-refusal clarification on unanswerable queries. Using this reward, we train Abstain-R1, a 3B model that improves abstention and clarification on unanswerable queries while preserving strong performance on answerable ones. Experiments on Abstain-Test, Abstain-QA, and SelfAware show that Abstain-R1 substantially improves over its base model and achieves unanswerable-query behavior competitive with larger systems including DeepSeek-R1, suggesting that calibrated abstention and clarification can be learned through verifiable rewards rather than emerging from scale alone.