R1-RE: RLVRを用いたクロスドメイン関係抽出

要旨

関係抽出（RE）は自然言語処理における中核的なタスクである。従来のアプローチでは、REを教師あり学習問題として定式化し、文脈を直接ラベルにマッピングする方法が一般的であったが、この方法はドメイン外（OOD）での汎化性能が低いという課題を抱えている。人間のアノテータのワークフローに着想を得て、本研究ではREをアノテーションガイドラインに基づく推論タスクとして再定義し、REタスク向けの初めての検証可能な報酬を伴う強化学習（RLVR）フレームワークであるR1-REを提案する。本手法は、小規模言語モデルの推論能力をアノテーションタスクに活用することで、OODロバスト性を大幅に向上させる。公開データセットSem-2010および非公開データセットMDKGを用いて評価を行った結果、R1-RE-7Bモデルは平均約70%のOOD精度を達成し、GPT-4oなどの主要なプロプライエタリモデルと同等の性能を示した。さらに、詳細な分析を通じて、RLVRパラダイムにおけるREのトレーニングダイナミクスおよび新たに現れる推論行動に関する新たな知見を提供する。

English

Relationship extraction (RE) is a core task in natural language processing. Traditional approaches typically frame RE as a supervised learning problem, directly mapping context to labels-an approach that often suffers from poor out-of-domain (OOD) generalization. Inspired by the workflow of human annotators, we reframe RE as a reasoning task guided by annotation guidelines and introduce R1-RE, the first reinforcement learning with verifiable reward (RLVR) framework for RE tasks. Our method elicits the reasoning abilities of small language models for annotation tasks, resulting in significantly improved OOD robustness. We evaluate our approach on the public Sem-2010 dataset and a private MDKG dataset. The R1-RE-7B model attains an average OOD accuracy of approximately 70%, on par with leading proprietary models such as GPT-4o. Additionally, our comprehensive analysis provides novel insights into the training dynamics and emergent reasoning behaviors of the RLVR paradigm for RE.

R1-RE: RLVRを用いたクロスドメイン関係抽出

R1-RE: Cross-Domain Relationship Extraction with RLVR

要旨

Support