R1-RE：基於RLVR的跨領域關係抽取

摘要

關係抽取（RE）是自然語言處理中的一項核心任務。傳統方法通常將RE視為一個監督學習問題，直接將上下文映射到標籤——這種方法往往在跨領域（OOD）泛化上表現不佳。受人類註釋者工作流程的啟發，我們將RE重新定義為由註釋指南引導的推理任務，並引入了R1-RE，這是首個用於RE任務的帶有可驗證獎勵的強化學習（RLVR）框架。我們的方法激發了小規模語言模型在註釋任務中的推理能力，從而顯著提升了OOD的魯棒性。我們在公開的Sem-2010數據集和私有的MDKG數據集上評估了我們的方法。R1-RE-7B模型達到了約70%的平均OOD準確率，與GPT-4o等領先的專有模型相當。此外，我們全面的分析為RLVR範式在RE中的訓練動態和新興推理行為提供了新的見解。

English

Relationship extraction (RE) is a core task in natural language processing. Traditional approaches typically frame RE as a supervised learning problem, directly mapping context to labels-an approach that often suffers from poor out-of-domain (OOD) generalization. Inspired by the workflow of human annotators, we reframe RE as a reasoning task guided by annotation guidelines and introduce R1-RE, the first reinforcement learning with verifiable reward (RLVR) framework for RE tasks. Our method elicits the reasoning abilities of small language models for annotation tasks, resulting in significantly improved OOD robustness. We evaluate our approach on the public Sem-2010 dataset and a private MDKG dataset. The R1-RE-7B model attains an average OOD accuracy of approximately 70%, on par with leading proprietary models such as GPT-4o. Additionally, our comprehensive analysis provides novel insights into the training dynamics and emergent reasoning behaviors of the RLVR paradigm for RE.

R1-RE：基於RLVR的跨領域關係抽取

R1-RE: Cross-Domain Relationship Extraction with RLVR

摘要

Support