Few-shot 在长上下文中能起作用吗?回收上下文以生成示范
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations
June 19, 2024
作者: Arie Cattan, Alon Jacovi, Alex Fabrikant, Jonathan Herzig, Roee Aharoni, Hannah Rashkin, Dror Marcus, Avinatan Hassidim, Yossi Matias, Idan Szpektor, Avi Caciularu
cs.AI
摘要
尽管大型语言模型(LLMs)近年来取得了进展,但它们在涉及长文本的任务中的表现仍然不理想。在上下文学习(ICL)中,利用少量示例可能是增强LLM在这种情况下性能的一个吸引人的解决方案;然而,简单地添加具有长上下文的ICL示例会带来挑战,包括为每个少量示例增加大量标记开销以及演示和目标查询之间的上下文不匹配。在本研究中,我们提出通过回收上下文来自动生成长上下文问答任务的少量示例。具体而言,给定一个长输入上下文(1-3k个标记)和一个查询,我们从给定上下文中生成额外的查询-输出对作为少量示例,同时只引入上下文一次。这确保了演示利用与目标查询相同的上下文,同时仅向提示添加少量标记。我们进一步通过指导模型明确识别回答之前相关段落来增强每个演示,从而提高性能,同时为回答来源提供细粒度的归因。我们将我们的方法应用于多个LLMs,并在多个具有长上下文的问答数据集上获得了实质性的改进(在各种模型上平均提高了+23%),特别是当答案位于上下文的中间时。令人惊讶的是,尽管只引入单跳ICL示例,LLMs也成功地利用我们的方法推广到多跳长上下文问答。
English
Despite recent advancements in Large Language Models (LLMs), their
performance on tasks involving long contexts remains sub-optimal. In-Context
Learning (ICL) with few-shot examples may be an appealing solution to enhance
LLM performance in this scenario; However, naively adding ICL examples with
long context introduces challenges, including substantial token overhead added
for each few-shot example and context mismatch between the demonstrations and
the target query. In this work, we propose to automatically generate few-shot
examples for long context QA tasks by recycling contexts. Specifically, given a
long input context (1-3k tokens) and a query, we generate additional
query-output pairs from the given context as few-shot examples, while
introducing the context only once. This ensures that the demonstrations are
leveraging the same context as the target query while only adding a small
number of tokens to the prompt. We further enhance each demonstration by
instructing the model to explicitly identify the relevant paragraphs before the
answer, which improves performance while providing fine-grained attribution to
the answer source. We apply our method on multiple LLMs and obtain substantial
improvements (+23\% on average across models) on various QA datasets with long
context, especially when the answer lies within the middle of the context.
Surprisingly, despite introducing only single-hop ICL examples, LLMs also
successfully generalize to multi-hop long-context QA using our approach.Summary
AI-Generated Summary