在大型語言模型中對中文知識校正進行基準測試

摘要

雖然大型語言模型（LLMs）展現出卓越的生成能力，但它們並非沒有缺陷，特別是在出現幻覺的情況下。當LLMs應用於特定語言和領域時，這個問題變得更加突出。例如，當處理中國古詩、諺語或成語時，LLMs可能生成無意義的信息，這是由於缺乏特定知識所致。為此，本文通過知識編輯引入了一個用於糾正LLMs中的中文知識的基準。具體來說，我們通過從各種來源收集七種類型的知識，包括古典文本、成語以及百度貼吧若字吧的內容，引入了一個新的中文數據集CKnowEdit，從而考慮了中文語言中固有的獨特的多音性、對立性和邏輯結構。通過對這個數據集的分析，我們揭示了當前LLMs在掌握中文方面所面臨的挑戰。此外，我們對最先進的知識編輯技術在這個數據集上的評估揭示了在糾正中文知識方面存在著巨大的進步空間。代碼和數據集可在https://github.com/zjunlp/EasyEdit找到。

English

While Large Language Models (LLMs) exhibit remarkable generative capabilities, they are not without flaws, particularly in the form of hallucinations. This issue is even more pronounced when LLMs are applied to specific languages and domains. For example, LLMs may generate nonsense information when handling Chinese ancient poetry, proverbs, or idioms, owing to the lack of specific knowledge. To this end, this paper introduces a benchmark for rectifying Chinese knowledge in LLMs via knowledge editing. Specifically, we introduce a new Chinese dataset, CKnowEdit, by collecting seven type of knowledge from various sources, including classical texts, idioms, and content from Baidu Tieba Ruozhiba, thereby accounting for the unique polyphony, antithesis, and logical constructs inherent in the Chinese language. Through the analysis of this dataset, we uncover the challenges faced by current LLMs in mastering Chinese. Furthermore, our evaluation of state-of-the-art knowledge editing techniques on this dataset unveil the substantial scope for advancement in the rectification of Chinese knowledge. Code and dataset are available at https://github.com/zjunlp/EasyEdit.

在大型語言模型中對中文知識校正進行基準測試

Benchmarking Chinese Knowledge Rectification in Large Language Models

摘要

Support