語言模型的事實準確性取決於查詢所使用的語言

摘要

多語言語言模型（LMs）理應能在不同語言間一致地回憶起事實知識，然而它們往往無法在語言間有效轉移知識，即使它們在其中一種語言中已擁有正確資訊。例如，我們發現當以阿拉伯語詢問時，一個語言模型可能正確識別出Rashed Al Shashai來自沙烏地阿拉伯，但在以英語或斯瓦希里語詢問時卻屢屢失敗。為系統性地探究這一限制，我們引入了一個涵蓋13種語言、包含10,000條國家相關事實的基準，並提出了三個新穎的指標：事實回憶分數、知識可轉移性分數及跨語言事實知識可轉移性分數，用以量化不同語言間LMs的事實回憶與知識轉移能力。我們的結果揭示了當今最先進LMs的根本弱點，特別是在跨語言泛化方面，模型未能有效跨語言轉移知識，導致其表現因使用語言不同而出現不一致性。這些發現強調了LMs需識別語言特定的事實可靠性，並利用跨語言中最可信資訊的重要性。我們公開發布了我們的基準與評估框架，以推動未來在多語言知識轉移領域的研究。

English

Multilingual language models (LMs) are expected to recall factual knowledge consistently across languages, yet they often fail to transfer knowledge between languages even when they possess the correct information in one of the languages. For example, we find that an LM may correctly identify Rashed Al Shashai as being from Saudi Arabia when asked in Arabic, but consistently fails to do so when asked in English or Swahili. To systematically investigate this limitation, we introduce a benchmark of 10,000 country-related facts across 13 languages and propose three novel metrics: Factual Recall Score, Knowledge Transferability Score, and Cross-Lingual Factual Knowledge Transferability Score-to quantify factual recall and knowledge transferability in LMs across different languages. Our results reveal fundamental weaknesses in today's state-of-the-art LMs, particularly in cross-lingual generalization where models fail to transfer knowledge effectively across different languages, leading to inconsistent performance sensitive to the language used. Our findings emphasize the need for LMs to recognize language-specific factual reliability and leverage the most trustworthy information across languages. We release our benchmark and evaluation framework to drive future research in multilingual knowledge transfer.

語言模型的事實準確性取決於查詢所使用的語言

Language Models' Factuality Depends on the Language of Inquiry

摘要

Support