R1-RE：基于RLVR的跨领域关系抽取

摘要

关系抽取（RE）是自然语言处理中的一项核心任务。传统方法通常将RE视为监督学习问题，直接将上下文映射到标签——这种方法往往在跨领域（OOD）泛化上表现不佳。受人类标注者工作流程的启发，我们将RE重新定义为由标注指南引导的推理任务，并引入了R1-RE，这是首个采用可验证奖励强化学习（RLVR）框架的RE任务方法。我们的方法激发了小型语言模型在标注任务中的推理能力，从而显著提升了OOD鲁棒性。我们在公开的Sem-2010数据集和私有的MDKG数据集上评估了该方法。R1-RE-7B模型实现了约70%的平均OOD准确率，与GPT-4o等领先的专有模型相当。此外，我们的全面分析为RLVR范式在RE任务中的训练动态和涌现推理行为提供了新颖的见解。

English

Relationship extraction (RE) is a core task in natural language processing. Traditional approaches typically frame RE as a supervised learning problem, directly mapping context to labels-an approach that often suffers from poor out-of-domain (OOD) generalization. Inspired by the workflow of human annotators, we reframe RE as a reasoning task guided by annotation guidelines and introduce R1-RE, the first reinforcement learning with verifiable reward (RLVR) framework for RE tasks. Our method elicits the reasoning abilities of small language models for annotation tasks, resulting in significantly improved OOD robustness. We evaluate our approach on the public Sem-2010 dataset and a private MDKG dataset. The R1-RE-7B model attains an average OOD accuracy of approximately 70%, on par with leading proprietary models such as GPT-4o. Additionally, our comprehensive analysis provides novel insights into the training dynamics and emergent reasoning behaviors of the RLVR paradigm for RE.

R1-RE：基于RLVR的跨领域关系抽取

R1-RE: Cross-Domain Relationship Extraction with RLVR

摘要

Support