FollowIR: 정보 검색 모델이 지시를 따르도록 평가하고 가르치기

초록

현대의 대규모 언어 모델(LLMs)은 다양한 사용자 작업을 가능하게 하는 길고 복잡한 지시를 따를 수 있는 능력을 갖추고 있습니다. 그러나 정보 검색(IR) 모델들이 LLMs를 그들의 아키텍처의 핵심으로 사용하고 있음에도 불구하고, 거의 모든 모델들은 여전히 쿼리만을 입력으로 받으며 지시를 포함하지 않습니다. 최근 소수의 모델들이 지시를 입력으로 받기는 하지만, 그들이 이를 어떻게 사용하는지는 명확하지 않습니다. 우리는 FollowIR 데이터셋을 소개합니다. 이 데이터셋은 엄격한 지시 평가 벤치마크와 IR 모델이 실제 세계의 지시를 더 잘 따르도록 학습하는 데 도움을 주는 훈련 세트를 포함하고 있습니다. FollowIR은 TREC 컨퍼런스의 오랜 역사를 기반으로 합니다: TREC이 인간 주석자들에게 문서 관련성을 결정하기 위한 지시(또는 서술)를 제공하는 것처럼, IR 모델들도 이러한 상세한 지시를 이해하고 관련성을 결정할 수 있어야 합니다. 우리의 평가 벤치마크는 세 개의 깊이 있게 판단된 TREC 컬렉션으로 시작하며, 주석자 지시를 변경하여 관련 문서를 다시 주석 처리합니다. 이 과정을 통해 우리는 새로운 쌍별 평가 프레임워크를 통해 IR 모델들이 지시를 얼마나 잘 따르는지 측정할 수 있습니다. 우리의 결과는 기존의 검색 모델들이 지시를 올바르게 사용하지 못하며, 기본적인 키워드로 사용하고 긴 형식의 정보를 이해하는 데 어려움을 겪는다는 것을 보여줍니다. 그러나 우리는 IR 모델들이 복잡한 지시를 따르도록 학습할 수 있음을 보여줍니다: 우리의 새로운 FollowIR-7B 모델은 우리의 훈련 세트에서 미세 조정 후 상당한 개선(13% 이상)을 보였습니다.

English

Modern Large Language Models (LLMs) are capable of following long and complex instructions that enable a diverse amount of user tasks. However, despite Information Retrieval (IR) models using LLMs as the backbone of their architectures, nearly all of them still only take queries as input, with no instructions. For the handful of recent models that do take instructions, it's unclear how they use them. We introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions. FollowIR builds off the long history of the TREC conferences: as TREC provides human annotators with instructions (also known as narratives) to determine document relevance, so should IR models be able to understand and decide relevance based on these detailed instructions. Our evaluation benchmark starts with three deeply judged TREC collections and alters the annotator instructions, re-annotating relevant documents. Through this process, we can measure how well IR models follow instructions, through a new pairwise evaluation framework. Our results indicate that existing retrieval models fail to correctly use instructions, using them for basic keywords and struggling to understand long-form information. However, we show that it is possible for IR models to learn to follow complex instructions: our new FollowIR-7B model has significant improvements (over 13%) after fine-tuning on our training set.

FollowIR: 정보 검색 모델이 지시를 따르도록 평가하고 가르치기

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

초록

Support