在大型语言模型中对中文知识纠正的基准测试

摘要

尽管大型语言模型（LLMs）展现出卓越的生成能力，但它们并非没有缺陷，特别是存在幻觉的形式。当LLMs应用于特定语言和领域时，这一问题变得更加突出。例如，当处理中国古诗、谚语或成语时，LLMs可能会生成无意义的信息，这是由于缺乏特定知识所致。因此，本文通过知识编辑引入了一个用于纠正LLMs中的中国知识的基准。具体而言，我们通过从各种来源收集七类知识，包括古典文本、成语以及百度贴吧若字吧的内容，引入了一个新的中文数据集CKnowEdit，从而考虑了中国语言中独特的多声、对立和逻辑结构。通过对该数据集的分析，我们揭示了当前LLMs在掌握中文方面面临的挑战。此外，我们对该数据集上最先进的知识编辑技术进行评估，揭示了在纠正中文知识方面存在的巨大进步空间。代码和数据集可在https://github.com/zjunlp/EasyEdit 上获得。

English

While Large Language Models (LLMs) exhibit remarkable generative capabilities, they are not without flaws, particularly in the form of hallucinations. This issue is even more pronounced when LLMs are applied to specific languages and domains. For example, LLMs may generate nonsense information when handling Chinese ancient poetry, proverbs, or idioms, owing to the lack of specific knowledge. To this end, this paper introduces a benchmark for rectifying Chinese knowledge in LLMs via knowledge editing. Specifically, we introduce a new Chinese dataset, CKnowEdit, by collecting seven type of knowledge from various sources, including classical texts, idioms, and content from Baidu Tieba Ruozhiba, thereby accounting for the unique polyphony, antithesis, and logical constructs inherent in the Chinese language. Through the analysis of this dataset, we uncover the challenges faced by current LLMs in mastering Chinese. Furthermore, our evaluation of state-of-the-art knowledge editing techniques on this dataset unveil the substantial scope for advancement in the rectification of Chinese knowledge. Code and dataset are available at https://github.com/zjunlp/EasyEdit.

在大型语言模型中对中文知识纠正的基准测试

Benchmarking Chinese Knowledge Rectification in Large Language Models

摘要

Support