探討低資源語言對話中的幻覺現象

摘要

大型语言模型（LLMs）在生成与人类写作极为相似的文本方面展现了卓越的能力。然而，它们常常生成事实错误的陈述，这一问题通常被称为“幻觉”。解决幻觉问题对于提升LLMs的可靠性和有效性至关重要。尽管大量研究聚焦于英语中的幻觉现象，本研究将这一探讨扩展至三种语言的对话数据：印地语、波斯语和普通话。我们提供了一份数据集的全面分析，以考察GPT-3.5、GPT-4o、Llama-3.1、Gemma-2.0、DeepSeek-R1及Qwen-3在这些语言中的事实错误与语言错误。研究发现，LLMs在普通话中产生的幻觉回应极少，而在印地语和波斯语中则生成了显著更多的幻觉内容。

English

Large Language Models (LLMs) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as 'hallucination'. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs. While much research has focused on hallucinations in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5, GPT-4o, Llama-3.1, Gemma-2.0, DeepSeek-R1 and Qwen-3. We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucinations in Hindi and Farsi.

探討低資源語言對話中的幻覺現象

Investigating Hallucination in Conversations for Low Resource Languages

摘要

Support