パニーニ：構造化メモリによるトークン空間での継続学習

要旨

言語モデルは、新規文書、進化する知識、ユーザー固有データなど、学習時に含まれていなかったコンテンツに対して推論を行う場面が増えている。一般的なアプローチである検索拡張生成（RAG）は、文書を外部に（チャンクとして）そのまま保存し、推論時に関連するサブセットのみを検索してLLMに推論させる。しかし、これでは推論時の計算リソースが非効率的（LLMが同じ文書を繰り返し処理）であり、さらに、チャンク検索によって無関係な文脈が混入し、根拠のない生成が増加するリスクがある。我々は、ベースモデルは固定したまま、新しい経験をそれぞれ外部の意味記憶状態に統合し、それが継続的に蓄積・統合されていく、人間に似たノンパラメトリックな継続学習フレームワークを提案する。これを実現するPaniniを紹介する。Paniniは文書を生成的意味ワークスペース（GSW）——エンティティとイベントを意識した質問応答（QA）ペアのネットワーク——として表現する。これは、LLMが経験した状況を再構築し、ネットワーク上での推論に基づく推論チェーンを通じて潜在知識を発掘するのに十分な表現である。クエリが与えられると、Paniniは継続的に更新されるGSWのみをトラバースし（元の文書やチャンクは参照しない）、最も可能性の高い推論チェーンを検索する。 6つのQAベンチマークにおける評価では、Paniniは平均性能が最も高く、他の有力なベースラインよりも5%～7%優れており、回答生成に必要な文脈トークン数が2～30分の1で済み、完全にオープンソースのパイプラインをサポートし、精選された回答不能クエリにおける根拠のない回答を削減した。これらの結果は、GSWフレームワークが達成するように、経験を「書き込み時」に効率的かつ正確に構造化することが、「読み出し時」の効率性と信頼性の両方の向上をもたらすことを示している。コードはhttps://github.com/roychowdhuryresearch/gsw-memory で公開されている。

English

Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. We present Panini, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network. Given a query, Panini only traverses the continually-updated GSW (not the verbatim documents or chunks), and retrieves the most likely inference chains. Across six QA benchmarks, Panini achieves the highest average performance, 5%-7% higher than other competitive baselines, while using 2-30x fewer answer-context tokens, supports fully open-source pipelines, and reduces unsupported answers on curated unanswerable queries. The results show that efficient and accurate structuring of experiences at write time -- as achieved by the GSW framework -- yields both efficiency and reliability gains at read time. Code is available at https://github.com/roychowdhuryresearch/gsw-memory.

パニーニ：構造化メモリによるトークン空間での継続学習

Panini: Continual Learning in Token Space via Structured Memory

要旨

Support