ChatPaper.aiChatPaper

如果您只需要檢索,那麼長上下文是否真的重要?邁向真正困難的長上下文自然語言處理

Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

June 29, 2024
作者: Omer Goldman, Alon Jacovi, Aviv Slobodkin, Aviya Maimon, Ido Dagan, Reut Tsarfaty
cs.AI

摘要

語言模型能力的提升推動了其應用範圍擴展至更長的上下文,使得長上下文的評估和發展成為一個活躍的研究領域。然而,在“長上下文”這個統稱下,許多不同的用例被歸為一類,僅根據模型輸入的總長度來定義,包括例如“大海捞針”任務、書籍摘要和信息聚合。鑒於它們各自的難度不同,我們在這篇立場論文中主張,通過上下文長度將不同任務混為一談是不具生產性的。作為一個社群,我們需要更精確的詞彙來理解長上下文任務的相似性或差異性。我們建議根據使長上下文任務隨著上下文長度增加而變得更難的特性,對長上下文的分類進行細分。我們提出了兩個正交的難度軸:(一)擴散:在上下文中找到必要信息有多難?(二)範圍:需要找到多少必要信息?我們對長上下文的文獻進行了調查,為這種分類法提供了說明作為一個具信息性的描述符,並將文獻與之相關聯。我們得出結論,那些最困難且最有趣的設置,其中必要信息在輸入中非常長且高度分散,目前研究尚未深入探討。通過使用描述性詞彙並討論長上下文難度的相關特性,我們可以在這一領域實現更加知情的研究。我們呼籲謹慎設計具有明顯長上下文特徵的任務和基準,考慮到使其在質上與較短上下文有所不同的特點。
English
Improvements in language models' capabilities have pushed their applications towards longer contexts, making long-context evaluation and development an active research area. However, many disparate use-cases are grouped together under the umbrella term of "long-context", defined simply by the total length of the model's input, including - for example - Needle-in-a-Haystack tasks, book summarization, and information aggregation. Given their varied difficulty, in this position paper we argue that conflating different tasks by their context length is unproductive. As a community, we require a more precise vocabulary to understand what makes long-context tasks similar or different. We propose to unpack the taxonomy of long-context based on the properties that make them more difficult with longer contexts. We propose two orthogonal axes of difficulty: (I) Diffusion: How hard is it to find the necessary information in the context? (II) Scope: How much necessary information is there to find? We survey the literature on long-context, provide justification for this taxonomy as an informative descriptor, and situate the literature with respect to it. We conclude that the most difficult and interesting settings, whose necessary information is very long and highly diffused within the input, is severely under-explored. By using a descriptive vocabulary and discussing the relevant properties of difficulty in long-context, we can implement more informed research in this area. We call for a careful design of tasks and benchmarks with distinctly long context, taking into account the characteristics that make it qualitatively different from shorter context.

Summary

AI-Generated Summary

PDF231November 28, 2024