FollowIR:评估和教授信息检索模型以遵循指令
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
March 22, 2024
作者: Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini
cs.AI
摘要
现代大型语言模型(LLMs)能够遵循长而复杂的指令,从而实现多样化的用户任务。然而,尽管信息检索(IR)模型使用LLMs作为其架构的基础,几乎所有这些模型仍然只接受查询作为输入,而没有指令。对于少数最近接受指令的模型,它们如何使用这些指令尚不清楚。我们引入了我们的数据集FollowIR,其中包含严格的指令评估基准以及一个训练集,帮助IR模型学会更好地遵循现实世界的指令。FollowIR基于TREC会议长期历史发展而来:正如TREC为人类注释者提供指令(也称为叙述)以确定文档相关性一样,IR模型应该能够根据这些详细的指令理解和确定相关性。我们的评估基准从三个经过深度评判的TREC集合开始,并修改注释者的指令,重新注释相关文档。通过这个过程,我们可以衡量IR模型如何遵循指令,通过一个新的成对评估框架。我们的结果表明,现有的检索模型未能正确使用指令,只将其用于基本关键词,并且难以理解长篇信息。然而,我们展示了IR模型可以学会遵循复杂指令的可能性:我们的新FollowIR-7B模型在我们的训练集上微调后取得了显著的改进(超过13%)。
English
Modern Large Language Models (LLMs) are capable of following long and complex
instructions that enable a diverse amount of user tasks. However, despite
Information Retrieval (IR) models using LLMs as the backbone of their
architectures, nearly all of them still only take queries as input, with no
instructions. For the handful of recent models that do take instructions, it's
unclear how they use them. We introduce our dataset FollowIR, which contains a
rigorous instruction evaluation benchmark as well as a training set for helping
IR models learn to better follow real-world instructions. FollowIR builds off
the long history of the TREC conferences: as TREC provides human annotators
with instructions (also known as narratives) to determine document relevance,
so should IR models be able to understand and decide relevance based on these
detailed instructions. Our evaluation benchmark starts with three deeply judged
TREC collections and alters the annotator instructions, re-annotating relevant
documents. Through this process, we can measure how well IR models follow
instructions, through a new pairwise evaluation framework. Our results indicate
that existing retrieval models fail to correctly use instructions, using them
for basic keywords and struggling to understand long-form information. However,
we show that it is possible for IR models to learn to follow complex
instructions: our new FollowIR-7B model has significant improvements (over 13%)
after fine-tuning on our training set.Summary
AI-Generated Summary