情境即所需:現實世界限制下大型語言模型的最大有效上下文窗口
Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs
September 21, 2025
作者: Norman Paulsen
cs.AI
摘要
大型語言模型(LLM)提供商常以最大上下文窗口尺寸作為宣傳亮點。為驗證上下文窗口在實際應用中的效能,我們:1)定義了最大有效上下文窗口的概念;2)制定了一套測試方法,用於評估不同尺寸及問題類型下上下文窗口的有效性;3)建立了一個標準化方式,以比較模型在逐漸增大上下文窗口尺寸時的效能,直至找出失效點。我們收集了數十萬個數據點,涵蓋多個模型,發現報告中的最大上下文窗口(MCW)尺寸與最大有效上下文窗口(MECW)尺寸之間存在顯著差異。研究結果表明,MECW不僅與MCW大相徑庭,還會根據問題類型而變化。測試組中一些頂尖模型在上下文僅有100個詞元時便出現故障;大多數模型在上下文達到1000個詞元時,準確率已大幅下降。所有模型的有效上下文窗口均遠未達到其宣稱的最大值,差距高達99%。我們的數據揭示了最大有效上下文窗口會根據所處理問題的類型而變化,為如何提升模型準確率及降低模型幻覺率提供了明確且可操作的洞見。
English
Large language model (LLM) providers boast big numbers for maximum context
window sizes. To test the real world use of context windows, we 1) define a
concept of maximum effective context window, 2) formulate a testing method of a
context window's effectiveness over various sizes and problem types, and 3)
create a standardized way to compare model efficacy for increasingly larger
context window sizes to find the point of failure. We collected hundreds of
thousands of data points across several models and found significant
differences between reported Maximum Context Window (MCW) size and Maximum
Effective Context Window (MECW) size. Our findings show that the MECW is, not
only, drastically different from the MCW but also shifts based on the problem
type. A few top of the line models in our test group failed with as little as
100 tokens in context; most had severe degradation in accuracy by 1000 tokens
in context. All models fell far short of their Maximum Context Window by as
much as 99 percent. Our data reveals the Maximum Effective Context Window
shifts based on the type of problem provided, offering clear and actionable
insights into how to improve model accuracy and decrease model hallucination
rates.