RARe: 具有上下文示例的檢索增強式檢索
RARe: Retrieval Augmented Retrieval with In-Context Examples
October 26, 2024
作者: Atula Tejaswi, Yoonsang Lee, Sujay Sanghavi, Eunsol Choi
cs.AI
摘要
我們研究在檢索任務中,是否在嵌入模型中使用廣泛應用於僅解碼器語言模型(LLMs)中的上下文示例能夠改善性能。與LLMs不同,單純地在推論時將上下文示例(查詢-文檔對)附加到目標查詢前並不能直接奏效。我們提出了一種簡單方法來使檢索器能夠使用上下文示例。我們的方法RARe,通過微調預訓練模型,使用與目標查詢語義相似的上下文示例。這可以應用於適應各種基礎架構(即僅解碼器語言模型、檢索器模型),在各種開放域檢索數據集(BeIR、RAR-b)中穩定實現高達+2.72% nDCG的性能增益。特別是,我們發現RARe展現出比使用沒有上下文示例的查詢的模型更強的跨域泛化能力,類似於LLMs中的上下文學習所見。我們進一步對上下文示例擴充的設計選擇進行分析,為未來在這一領域的工作奠定基礎。
English
We investigate whether in-context examples, widely used in decoder-only
language models (LLMs), can improve embedding model performance in retrieval
tasks. Unlike in LLMs, naively prepending in-context examples (query-document
pairs) to the target query at inference time does not work out of the box. We
introduce a simple approach to enable retrievers to use in-context examples.
Our approach, RARe, finetunes a pre-trained model with in-context examples
whose query is semantically similar to the target query. This can be applied to
adapt various base architectures (i.e., decoder-only language models, retriever
models) and consistently achieves performance gains of up to +2.72% nDCG across
various open-domain retrieval datasets (BeIR, RAR-b). In particular, we find
RARe exhibits stronger out-of-domain generalization compared to models using
queries without in-context examples, similar to what is seen for in-context
learning in LLMs. We further provide analysis on the design choices of
in-context example augmentation and lay the foundation for future work in this
space.Summary
AI-Generated Summary