如果你只需要检索,那么长上下文真的重要吗?走向真正困难的长上下文自然语言处理
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP
June 29, 2024
作者: Omer Goldman, Alon Jacovi, Aviv Slobodkin, Aviya Maimon, Ido Dagan, Reut Tsarfaty
cs.AI
摘要
语言模型能力的提升推动了其应用范围向更长上下文发展,使得长上下文的评估和开发成为一个活跃的研究领域。然而,在“长上下文”这一总称下,许多不同的用例被归为一类,仅通过模型输入的总长度来定义,包括例如“大海捞针”任务、书籍摘要和信息聚合等。鉴于它们各自的难度不同,在本文中我们认为通过上下文长度来混淆不同任务是不具生产性的。作为一个社区,我们需要更精确的词汇来理解长上下文任务之间的相似性或差异性。我们建议根据使任务在更长上下文中更难的属性来拆分长上下文的分类体系。我们提出了两个正交的难度维度:(一)扩散:在上下文中找到必要信息有多难?(二)范围:需要找到多少必要信息?我们调查了关于长上下文的文献,为这一分类体系提供了合理性论据,并将文献置于其中。我们得出结论,那些最困难和有趣的设置,其中必要信息非常长且在输入中高度扩散的情况,目前研究不足。通过使用描述性词汇并讨论长上下文难度的相关属性,我们可以在这一领域实施更具信息的研究。我们呼吁谨慎设计具有明显长上下文的任务和基准测试,并考虑使其在质上与较短上下文有所不同的特征。
English
Improvements in language models' capabilities have pushed their applications
towards longer contexts, making long-context evaluation and development an
active research area. However, many disparate use-cases are grouped together
under the umbrella term of "long-context", defined simply by the total length
of the model's input, including - for example - Needle-in-a-Haystack tasks,
book summarization, and information aggregation. Given their varied difficulty,
in this position paper we argue that conflating different tasks by their
context length is unproductive. As a community, we require a more precise
vocabulary to understand what makes long-context tasks similar or different. We
propose to unpack the taxonomy of long-context based on the properties that
make them more difficult with longer contexts. We propose two orthogonal axes
of difficulty: (I) Diffusion: How hard is it to find the necessary information
in the context? (II) Scope: How much necessary information is there to find? We
survey the literature on long-context, provide justification for this taxonomy
as an informative descriptor, and situate the literature with respect to it. We
conclude that the most difficult and interesting settings, whose necessary
information is very long and highly diffused within the input, is severely
under-explored. By using a descriptive vocabulary and discussing the relevant
properties of difficulty in long-context, we can implement more informed
research in this area. We call for a careful design of tasks and benchmarks
with distinctly long context, taking into account the characteristics that make
it qualitatively different from shorter context.Summary
AI-Generated Summary