대규모 언어 모델은 따라잡을 수 있을까? 지식의 지속적 흐름에 대한 온라인 적응 능력 벤치마킹

초록

동적인 현실 세계 맥락에서 작동하는 LLM은 지속적으로 진화하거나 점진적으로 출현하는 지식을 자주 접합니다. 모델이 정확하고 효과적으로 작동하려면 새로 유입되는 정보에 실시간으로 적응해야 합니다. 우리는 이러한 능력을 평가하기 위해 OAKS(Online Adaptation to Continual Knowledge Streams)를 소개하며, 지속적으로 갱신되는 스트리밍 지식에 대한 온라인 적응 능력 벤치마크를 확립합니다. 구체적으로, 이 벤치마크는 시간 간격에 따라 사실이 동적으로 변화하는 세분화된 컨텍스트 청크들의 연속으로 구성됩니다. OAKS는 OAKS-BABI와 OAKS-Novel 두 가지 데이터셋으로 구성되며, 각 데이터셋에서 개별 사실들이 여러 컨텍스트 청크에 걸쳐 여러 번 진화합니다. 이 데이터셋들은 모델이 변화를 정확하게 추적하는지 측정하기 위한 밀집된 주석을 포함합니다. 다양한 추론 방식을 가진 14개의 모델을 평가한 결과, 현재 방법론들의 심각한 한계를 관찰했습니다. 최첨단 모델과 에이전시 기억 시스템 모두 OAKS에서 강건하게 적응하지 못하며, 스트리밍 환경 내에서 상태 추적의 지연과 방해 요소에 대한 취약성을 보여줍니다.

English

LLMs operating in dynamic real-world contexts often encounter knowledge that evolves continuously or emerges incrementally. To remain accurate and effective, models must adapt to newly arriving information on the fly. We introduce Online Adaptation to Continual Knowledge Streams(OAKS) to evaluate this capability, establishing a benchmark for online adaptation over streaming, continually updating knowledge. Specifically, the benchmark is structured as a sequence of fine-grained context chunks where facts change dynamically across time intervals. OAKS comprises two datasets: OAKS-BABI and OAKS-Novel, where individual facts evolve multiple times across context chunks. These datasets include dense annotations to measure whether models track changes accurately. Evaluating 14 models with varied inference approaches, we observe significant limitations in current methodologies. Both state-of-the-art models and agentic memory systems fail to adapt robustly on OAKS, demonstrating delays in state-tracking and susceptibility to distraction within streaming environments.

대규모 언어 모델은 따라잡을 수 있을까? 지식의 지속적 흐름에 대한 온라인 적응 능력 벤치마킹

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

초록

Support