R1-RE: RLVR을 활용한 크로스 도메인 관계 추출

초록

관계 추출(RE)은 자연어 처리의 핵심 과제 중 하나이다. 전통적인 접근법은 일반적으로 RE를 지도 학습 문제로 설정하며, 문맥을 레이블에 직접 매핑하는 방식으로 진행된다. 그러나 이러한 방식은 도메인 외(OOD) 일반화 성능이 낮은 경우가 많다. 인간 주석자의 작업 흐름에서 영감을 받아, 우리는 RE를 주석 지침에 의해 안내되는 추론 과제로 재구성하고, RE 작업을 위한 검증 가능한 보상 강화 학습(RLVR) 프레임워크인 R1-RE를 소개한다. 우리의 방법은 소규모 언어 모델의 주석 작업을 위한 추론 능력을 이끌어내어 OOD 견고성을 크게 향상시킨다. 우리는 이 접근법을 공개된 Sem-2010 데이터셋과 비공개 MDKG 데이터셋에서 평가하였다. R1-RE-7B 모델은 약 70%의 평균 OOD 정확도를 달성하며, GPT-4o와 같은 선도적인 사유 모델과 동등한 성능을 보인다. 또한, 우리의 포괄적인 분석은 RE를 위한 RLVR 패러다임의 학습 동역학과 새롭게 나타나는 추론 행동에 대한 새로운 통찰을 제공한다.

English

Relationship extraction (RE) is a core task in natural language processing. Traditional approaches typically frame RE as a supervised learning problem, directly mapping context to labels-an approach that often suffers from poor out-of-domain (OOD) generalization. Inspired by the workflow of human annotators, we reframe RE as a reasoning task guided by annotation guidelines and introduce R1-RE, the first reinforcement learning with verifiable reward (RLVR) framework for RE tasks. Our method elicits the reasoning abilities of small language models for annotation tasks, resulting in significantly improved OOD robustness. We evaluate our approach on the public Sem-2010 dataset and a private MDKG dataset. The R1-RE-7B model attains an average OOD accuracy of approximately 70%, on par with leading proprietary models such as GPT-4o. Additionally, our comprehensive analysis provides novel insights into the training dynamics and emergent reasoning behaviors of the RLVR paradigm for RE.

R1-RE: RLVR을 활용한 크로스 도메인 관계 추출

R1-RE: Cross-Domain Relationship Extraction with RLVR

초록

Support