大規模言語モデルは追いつけるのか？継続的知識ストリームへのオンライン適応のベンチマーキング

要旨

動的な実世界環境で動作するLLMは、継続的に進化する知識や段階的に出現する知識に頻繁に遭遇します。モデルが正確かつ効果的に機能し続けるためには、新しく到着する情報にその場で適応する必要があります。本研究では、この能力を評価するために「継続的知識ストリームへのオンライン適応（OAKS）」を提案し、ストリーミングされ継続的に更新される知識に対するオンライン適応のベンチマークを確立します。具体的には、ベンチマークは、時間間隔ごとに事実が動的に変化する、細粒度のコンテキストチャンクのシーケンスとして構成されています。OAKSはOAKS-BABIとOAKS-Novelの2つのデータセットで構成され、個々の事実がコンテキストチャンクを跨いで複数回変化します。これらのデータセットには、モデルが変化を正確に追跡しているかを測定するための密な注釈が含まれています。様々な推論手法を用いた14のモデルを評価した結果、現在の手法には重大な限界があることが観察されました。最先端のモデルとエージェント的なメモリシステムの両方が、OAKSにおいて堅牢に適応できず、状態追跡の遅延や、ストリーミング環境内での注意散漫への脆弱性を示しました。

English

LLMs operating in dynamic real-world contexts often encounter knowledge that evolves continuously or emerges incrementally. To remain accurate and effective, models must adapt to newly arriving information on the fly. We introduce Online Adaptation to Continual Knowledge Streams(OAKS) to evaluate this capability, establishing a benchmark for online adaptation over streaming, continually updating knowledge. Specifically, the benchmark is structured as a sequence of fine-grained context chunks where facts change dynamically across time intervals. OAKS comprises two datasets: OAKS-BABI and OAKS-Novel, where individual facts evolve multiple times across context chunks. These datasets include dense annotations to measure whether models track changes accurately. Evaluating 14 models with varied inference approaches, we observe significant limitations in current methodologies. Both state-of-the-art models and agentic memory systems fail to adapt robustly on OAKS, demonstrating delays in state-tracking and susceptibility to distraction within streaming environments.

大規模言語モデルは追いつけるのか？継続的知識ストリームへのオンライン適応のベンチマーキング

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

要旨

Support