The factuality of language models varies significantly based on the language used for inquiry. This phenomenon, known as "language-dependent factuality," highlights the challenges in developing truly multilingual AI systems. Our research demonstrates that even state-of-the-art models exhibit substantial discrepancies in factual accuracy across different languages, particularly for low-resource languages. These findings underscore the need for more robust evaluation metrics and training approaches that account for linguistic diversity and ensure consistent factual reliability across all supported languages.
Language Models' Factuality Depends on the Language of Inquiry