情境即所需：现实世界限制下大语言模型的最大有效上下文窗口

摘要

大型语言模型（LLM）提供商常以最大上下文窗口尺寸为傲。为检验上下文窗口在实际应用中的表现，我们采取了以下步骤：1）定义了最大有效上下文窗口的概念；2）制定了一套测试方法，评估不同大小上下文窗口及各类问题上的效能；3）建立了一个标准化方式，用以比较模型在逐步增大上下文窗口尺寸时的效能，直至发现失效点。我们收集了跨越多个模型的数十万数据点，发现报告的最大上下文窗口（MCW）尺寸与最大有效上下文窗口（MECW）尺寸之间存在显著差异。研究结果表明，MECW不仅与MCW大相径庭，而且还会根据问题类型发生变化。测试组中几款顶尖模型在上下文仅有100个标记时便告失败；大多数模型在上下文达到1000个标记时，准确率已严重下降。所有模型的实际表现均远未达到其最大上下文窗口，差距高达99%。我们的数据揭示了最大有效上下文窗口会随所提供问题类型而变动，为如何提升模型准确率、降低模型幻觉率提供了明确且可操作的洞见。

English

Large language model (LLM) providers boast big numbers for maximum context window sizes. To test the real world use of context windows, we 1) define a concept of maximum effective context window, 2) formulate a testing method of a context window's effectiveness over various sizes and problem types, and 3) create a standardized way to compare model efficacy for increasingly larger context window sizes to find the point of failure. We collected hundreds of thousands of data points across several models and found significant differences between reported Maximum Context Window (MCW) size and Maximum Effective Context Window (MECW) size. Our findings show that the MECW is, not only, drastically different from the MCW but also shifts based on the problem type. A few top of the line models in our test group failed with as little as 100 tokens in context; most had severe degradation in accuracy by 1000 tokens in context. All models fell far short of their Maximum Context Window by as much as 99 percent. Our data reveals the Maximum Effective Context Window shifts based on the type of problem provided, offering clear and actionable insights into how to improve model accuracy and decrease model hallucination rates.

情境即所需：现实世界限制下大语言模型的最大有效上下文窗口

Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs

摘要

Support