ChroKnowledge：揭示語言模型在多個領域中的時間知識

摘要

大型語言模型（LLMs）已對我們生活的許多方面產生了顯著影響。然而，評估和確保它們的時間知識仍然是具有挑戰性的。現有方法在處理知識的累積性方面存在不足，通常依賴單個時間戳。為了克服這一問題，我們引入了ChroKnowBench，這是一個旨在評估跨多個領域、時間依賴性和時間狀態的時間累積知識的基準數據集。我們的基準數據集區分了知識的演變（例如科學發現、修訂法律）和保持不變的知識（例如數學真理、常識事實）。基於這個基準數據集，我們提出了ChroKnowledge（知識的時間分類），這是一個用於評估和更新LLMs的非參數化時間知識的新型基於抽樣的框架。我們的評估顯示：（1）誘發時間知識的能力取決於模型訓練的數據格式。（2）LLMs部分回憶知識，或者在時間邊界處截斷，而不是正確回憶所有知識的各個方面。因此，我們應用了我們的ChroKnowPrompt，通過逐步遍歷周圍的時間跨度來引發時間知識的深入提示。我們觀察到，我們的框架成功地更新了整個時間線上的整體知識，無論是在生物醫學領域（+11.9%）還是在一般領域（+2.8%），展示了其在精煉時間知識方面的有效性。這種非參數化方法還使知識更新不僅適用於開源模型，還適用於專有的LLMs，確保了在各種模型類型中的全面應用。我們基於ChroKnowPrompt的時間特徵進行了全面分析，並通過我們的方法驗證了各種模型引發內在時間知識的潛力。

English

Large language models (LLMs) have significantly impacted many aspects of our lives. However, assessing and ensuring their chronological knowledge remains challenging. Existing approaches fall short in addressing the accumulative nature of knowledge, often relying on a single time stamp. To overcome this, we introduce ChroKnowBench, a benchmark dataset designed to evaluate chronologically accumulated knowledge across three key aspects: multiple domains, time dependency, temporal state. Our benchmark distinguishes between knowledge that evolves (e.g., scientific discoveries, amended laws) and knowledge that remain constant (e.g., mathematical truths, commonsense facts). Building on this benchmark, we present ChroKnowledge (Chronological Categorization of Knowledge), a novel sampling-based framework for evaluating and updating LLMs' non-parametric chronological knowledge. Our evaluation shows: (1) The ability of eliciting temporal knowledge varies depending on the data format that model was trained on. (2) LLMs partially recall knowledge or show a cut-off at temporal boundaries rather than recalling all aspects of knowledge correctly. Thus, we apply our ChroKnowPrompt, an in-depth prompting to elicit chronological knowledge by traversing step-by-step through the surrounding time spans. We observe that our framework successfully updates the overall knowledge across the entire timeline in both the biomedical domain (+11.9%) and the general domain (+2.8%), demonstrating its effectiveness in refining temporal knowledge. This non-parametric approach also enables knowledge updates not only in open-source models but also in proprietary LLMs, ensuring comprehensive applicability across model types. We perform a comprehensive analysis based on temporal characteristics of ChroKnowPrompt and validate the potential of various models to elicit intrinsic temporal knowledge through our method.

ChroKnowledge：揭示語言模型在多個領域中的時間知識

ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains

摘要

Support