컨텍스트가 핵심이다: 실세계 제약 하에서 LLM의 최대 효과적 컨텍스트 윈도우

초록

대형 언어 모델(LLM) 제공업체들은 최대 컨텍스트 윈도우 크기에 대해 큰 수치를 자랑합니다. 컨텍스트 윈도우의 실제 사용을 테스트하기 위해, 우리는 1) 최대 유효 컨텍스트 윈도우 개념을 정의하고, 2) 다양한 크기와 문제 유형에 걸쳐 컨텍스트 윈도우의 효과를 테스트하는 방법을 공식화하며, 3) 점점 더 큰 컨텍스트 윈도우 크기에 대한 모델 효능을 비교하기 위한 표준화된 방법을 만들어 실패 지점을 찾았습니다. 우리는 여러 모델에 걸쳐 수십만 개의 데이터 포인트를 수집했고, 보고된 최대 컨텍스트 윈도우(MCW) 크기와 최대 유효 컨텍스트 윈도우(MECW) 크기 사이에 상당한 차이가 있음을 발견했습니다. 우리의 연구 결과는 MECW가 MCW와 크게 다를 뿐만 아니라 문제 유형에 따라 변한다는 것을 보여줍니다. 테스트 그룹의 몇 가지 최고 수준 모델은 컨텍스트에 100개의 토큰만 있어도 실패했으며, 대부분은 컨텍스트에 1000개의 토큰이 있을 때 정확도가 심각하게 저하되었습니다. 모든 모델은 최대 컨텍스트 윈도우에 비해 최대 99%까지 미치지 못했습니다. 우리의 데이터는 제공된 문제 유형에 따라 최대 유효 컨텍스트 윈도우가 변한다는 것을 보여주며, 모델 정확도를 향상시키고 모델 환각률을 줄이는 방법에 대한 명확하고 실행 가능한 통찰을 제공합니다.

English

Large language model (LLM) providers boast big numbers for maximum context window sizes. To test the real world use of context windows, we 1) define a concept of maximum effective context window, 2) formulate a testing method of a context window's effectiveness over various sizes and problem types, and 3) create a standardized way to compare model efficacy for increasingly larger context window sizes to find the point of failure. We collected hundreds of thousands of data points across several models and found significant differences between reported Maximum Context Window (MCW) size and Maximum Effective Context Window (MECW) size. Our findings show that the MECW is, not only, drastically different from the MCW but also shifts based on the problem type. A few top of the line models in our test group failed with as little as 100 tokens in context; most had severe degradation in accuracy by 1000 tokens in context. All models fell far short of their Maximum Context Window by as much as 99 percent. Our data reveals the Maximum Effective Context Window shifts based on the type of problem provided, offering clear and actionable insights into how to improve model accuracy and decrease model hallucination rates.

컨텍스트가 핵심이다: 실세계 제약 하에서 LLM의 최대 효과적 컨텍스트 윈도우

Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs

초록

Support