DREAM: 자기회귀 모델링을 통한 밀집 검색 임베딩

초록

고밀도 검색 임베딩 모델은 현대 검색 기반 AI 시스템의 핵심 구성 요소입니다. 대부분의 고밀도 검색기는 대비 학습 목적 함수(contrastive objectives)로 훈련되며, 이는 레이블이 지정된 긍정 및 부정 문서 쌍을 필요로 하는데, 이러한 쌍은 종종 비용이 많이 들고 획득하기 어렵습니다. 본 연구에서는 대규모 언어 모델(LLM)의 자기회귀적 다음 토큰 예측 목적 함수가 고밀도 검색에 대한 지도 신호를 제공할 수 있는지 조사합니다. 기본 직관은 간단합니다. 문서에 질의와 관련된 정보가 포함되어 있다면, 해당 문서를 조건으로 할 때 LLM이 목표 출력을 더 쉽게 예측할 수 있어야 한다는 것입니다. 주요 과제는 다음 토큰 예측 손실이 LLM 내부에서 계산되는 반면, 검색기는 별도의 임베딩 모델이라는 점입니다. 이 과제를 해결하기 위해, 우리는 DREAM(Dense Retrieval Embeddings via Autoregressive Modeling)을 제안합니다. 이 방법은 검색기에서 생성된 질의-문서 유사도 점수를 고정된 LLM의 선택된 어텐션 헤드에 주입합니다. 훈련 중에 이 점수들은 LLM이 목표 출력을 예측할 때 각 후보 문서가 얼마나 많은 어텐션을 받을지를 결정합니다. 결과적인 예측 손실은 어텐션 메커니즘을 통해 검색기 훈련을 위한 그래디언트를 제공합니다. 우리는 0.5B에서 3B 파라미터 범위의 임베딩 백본을 사용하여 검색 벤치마크 BEIR 및 RTEB에서 DREAM을 평가합니다. DREAM은 다양한 모델 규모에서 기존 기준선을 일관되게 능가합니다. 이러한 결과는 DREAM이 자기회귀적 모델링을 통해 고밀도 검색기를 훈련하는 유망한 접근법을 제공함을 보여줍니다.

English

Dense retrieval embedding models are a fundamental component of modern retrieval-based AI systems. Most dense retrievers are trained with contrastive objectives, which require labeled positive and negative document pairs that are often costly and difficult to obtain. In this work, we investigate whether the autoregressive next-token prediction objective of a large language model (LLM) can provide supervision for dense retrieval. The intuition is simple: if a document contains information relevant to a query, conditioning on that document should make the target output easier for the LLM to predict. A key challenge is that the next-token prediction loss is computed inside the LLM, while the retriever is a separate embedding model. To address this challenge, we propose DREAM (Dense Retrieval Embeddings via Autoregressive Modeling), which injects retriever-generated query-document similarity scores into selected attention heads of a frozen LLM. During training, these scores determine how much attention each candidate document receives while the LLM predicts the target output. The resulting prediction loss provides gradients for retriever training through the attention mechanism. We evaluate DREAM on retrieval benchmarks BEIR and RTEB using embedding backbones ranging from 0.5B to 3B parameters. DREAM consistently outperforms existing baselines across different model scales. These results demonstrate that DREAM provides a promising approach for training dense retrievers through autoregressive modeling.