大型语言模型能否跟上步伐？持续知识流在线适应能力基准测试

摘要

在动态现实场景中运行的LLMs常常面临持续演进或渐进涌现的知识。为保持准确性与有效性，模型必须实时适应不断涌现的新信息。我们提出持续知识流在线适应基准（OAKS）来评估这种能力，为流式持续更新知识上的在线适应建立衡量标准。该基准具体呈现为细粒度语境片段序列，其中事实会随时间区间动态变化。OAKS包含OAKS-BABI和OAKS-Novel两个数据集，每个数据集中的独立事实会在不同语境片段间经历多次演变。这些数据集配有密集标注，用于衡量模型是否准确追踪变化。通过对14种采用不同推理方法的模型进行评估，我们发现现有方法存在显著局限：无论是前沿模型还是具备记忆机制的智能体系统，均未能在OAKS基准上展现稳健的适应能力，表现出状态追踪延迟以及在流式环境中的抗干扰能力薄弱等问题。

English

LLMs operating in dynamic real-world contexts often encounter knowledge that evolves continuously or emerges incrementally. To remain accurate and effective, models must adapt to newly arriving information on the fly. We introduce Online Adaptation to Continual Knowledge Streams(OAKS) to evaluate this capability, establishing a benchmark for online adaptation over streaming, continually updating knowledge. Specifically, the benchmark is structured as a sequence of fine-grained context chunks where facts change dynamically across time intervals. OAKS comprises two datasets: OAKS-BABI and OAKS-Novel, where individual facts evolve multiple times across context chunks. These datasets include dense annotations to measure whether models track changes accurately. Evaluating 14 models with varied inference approaches, we observe significant limitations in current methodologies. Both state-of-the-art models and agentic memory systems fail to adapt robustly on OAKS, demonstrating delays in state-tracking and susceptibility to distraction within streaming environments.

大型语言模型能否跟上步伐？持续知识流在线适应能力基准测试

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

摘要

Support