LLM中长上下文扩展和泛化的受控研究

摘要

广泛的文本理解和上下文学习需要利用完整的文档上下文的语言模型。由于直接训练长上下文模型所涉及的实施挑战，已经提出了许多方法来扩展模型以处理长上下文。然而，由于数据和模型类之间的差异，比较这些方法一直是具有挑战性的，导致如何评估长上下文性能以及它是否与标准评估有所不同存在不确定性。我们采用了一个受控的扩展方法协议，配合标准化评估，利用一致的基础模型和扩展数据。我们的研究为长上下文行为提供了几个见解。首先，我们重新确认了困惑度作为一种通用性能指标的关键作用，即使在更长的上下文任务中也是如此。其次，我们发现当前的近似注意力方法在长上下文任务中普遍表现不佳。最后，我们证实了基于精细调整的方法通常在其扩展范围内是有效的，而外推仍然具有挑战性。所有的代码库、模型和检查点都将开源，促进透明度并促进在这一关键的AI发展领域进行进一步研究。

English

Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts. However, owing to differences in data and model classes, it has been challenging to compare these approaches, leading to uncertainty as to how to evaluate long-context performance and whether it differs from standard evaluation. We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data. Our study yields several insights into long-context behavior. First, we reaffirm the critical role of perplexity as a general-purpose performance indicator even in longer-context tasks. Second, we find that current approximate attention methods systematically underperform across long-context tasks. Finally, we confirm that exact fine-tuning based methods are generally effective within the range of their extension, whereas extrapolation remains challenging. All codebases, models, and checkpoints will be made available open-source, promoting transparency and facilitating further research in this critical area of AI development.

LLM中长上下文扩展和泛化的受控研究

A Controlled Study on Long Context Extension and Generalization in LLMs

摘要

Support