少樣本學習在長文本中可行嗎?重複利用上下文生成示範
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations
June 19, 2024
作者: Arie Cattan, Alon Jacovi, Alex Fabrikant, Jonathan Herzig, Roee Aharoni, Hannah Rashkin, Dror Marcus, Avinatan Hassidim, Yossi Matias, Idan Szpektor, Avi Caciularu
cs.AI
摘要
儘管大型語言模型(LLMs)近年來取得了顯著進展,但在涉及長文本內容的任務上,它們的表現仍然不盡理想。在上下文學習(ICL)中使用少量示例可能是增強LLM在這種情況下表現的一個吸引人的解決方案;然而,單純地添加具有長文本內容的ICL示例會帶來挑戰,包括為每個少量示例增加大量標記開銷以及示範和目標查詢之間的上下文不匹配。在這項工作中,我們提出通過回收上下文來自動生成長文本內容問答任務的少量示例。具體而言,給定一個長輸入上下文(1-3k標記)和一個查詢,我們從給定的上下文中生成額外的查詢-輸出對作為少量示例,同時僅引入上下文一次。這確保示範利用與目標查詢相同的上下文,同時僅向提示添加少量標記。我們進一步通過指示模型明確識別答案之前的相關段落來增強每個示範,這樣做既提高了性能,又為答案來源提供了細緻的歸因。我們將這種方法應用於多個LLMs,並在各種具有長上下文的問答數據集上實現了顯著的改進(在各模型上平均+23%),特別是當答案位於上下文中間時。令人驚訝的是,儘管僅引入單躍ICL示例,LLMs也成功地通過我們的方法推廣到多躍長上下文問答。
English
Despite recent advancements in Large Language Models (LLMs), their
performance on tasks involving long contexts remains sub-optimal. In-Context
Learning (ICL) with few-shot examples may be an appealing solution to enhance
LLM performance in this scenario; However, naively adding ICL examples with
long context introduces challenges, including substantial token overhead added
for each few-shot example and context mismatch between the demonstrations and
the target query. In this work, we propose to automatically generate few-shot
examples for long context QA tasks by recycling contexts. Specifically, given a
long input context (1-3k tokens) and a query, we generate additional
query-output pairs from the given context as few-shot examples, while
introducing the context only once. This ensures that the demonstrations are
leveraging the same context as the target query while only adding a small
number of tokens to the prompt. We further enhance each demonstration by
instructing the model to explicitly identify the relevant paragraphs before the
answer, which improves performance while providing fine-grained attribution to
the answer source. We apply our method on multiple LLMs and obtain substantial
improvements (+23\% on average across models) on various QA datasets with long
context, especially when the answer lies within the middle of the context.
Surprisingly, despite introducing only single-hop ICL examples, LLMs also
successfully generalize to multi-hop long-context QA using our approach.Summary
AI-Generated Summary