Search-R3: 대규모 언어 모델에서 추론과 임베딩 생성을 통합하기

초록

자연어 이해 능력이 뛰어남에도 불구하고, 대형 언어 모델(LLMs)은 검색 작업에 있어서 제대로 활용되지 못해 왔습니다. 우리는 이러한 한계를 극복하기 위해 LLMs가 추론 과정의 직접적인 결과물로 검색 임베딩을 생성하도록 적응시키는 새로운 프레임워크인 Search-R3를 제안합니다. 우리의 접근 방식은 LLMs의 사고의 연쇄(chain-of-thought) 능력을 활용하여, 복잡한 의미 분석을 단계별로 추론함으로써 더 효과적인 임베딩을 생성할 수 있도록 합니다. 이를 위해 세 가지 상호 보완적인 메커니즘을 구현했습니다. (1) 지도 학습 단계를 통해 모델이 고품질 임베딩을 생성할 수 있는 능력을 갖추도록 하고, (2) 강화 학습(RL) 방법론을 통해 추론과 함께 임베딩 생성을 최적화하며, (3) 각 훈련 반복마다 전체 코퍼스를 재인코딩할 필요 없이 진화하는 임베딩 표현을 효율적으로 처리할 수 있는 전용 RL 환경을 구축했습니다. 다양한 벤치마크에 대한 광범위한 평가를 통해 Search-R3가 추론과 임베딩 생성 과정을 통합함으로써 기존 방법들을 크게 능가함을 입증했습니다. 이 통합 사후 훈련 접근법은 정교한 추론과 효과적인 정보 검색이 모두 요구되는 복잡한 지식 집약적 작업을 처리하는 데 있어 상당한 진전을 나타냅니다. 프로젝트 페이지: https://github.com/ytgui/Search-R3

English

Despite their remarkable natural language understanding capabilities, Large Language Models (LLMs) have been underutilized for retrieval tasks. We present Search-R3, a novel framework that addresses this limitation by adapting LLMs to generate search embeddings as a direct output of their reasoning process. Our approach exploits LLMs' chain-of-thought capabilities, allowing them to produce more effective embeddings by reasoning step-by-step through complex semantic analyses. We implement this through three complementary mechanisms. (1) a supervised learning stage enables the model's ability to produce quality embeddings, (2) a reinforcement learning (RL) methodology that optimizes embedding generation alongside reasoning, and (3) a specialized RL environment that efficiently handles evolving embedding representations without requiring complete corpus re-encoding at each training iteration. Our extensive evaluations on diverse benchmarks demonstrate that Search-R3 significantly outperforms prior methods by unifying the reasoning and embedding generation processes. This integrated post-training approach represents a substantial advancement in handling complex knowledge-intensive tasks that require both sophisticated reasoning and effective information retrieval. Project page: https://github.com/ytgui/Search-R3

Search-R3: 대규모 언어 모델에서 추론과 임베딩 생성을 통합하기

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

초록

Support