Indagine sull'Allucinazione nelle Conversazioni per Lingue con Risorse Limitare

Abstract

I modelli linguistici di grandi dimensioni (LLM) hanno dimostrato una notevole capacità nel generare testi che assomigliano strettamente alla scrittura umana. Tuttavia, spesso producono affermazioni fattualmente errate, un problema comunemente definito come "allucinazione". Affrontare le allucinazioni è cruciale per migliorare l'affidabilità e l'efficacia degli LLM. Mentre gran parte della ricerca si è concentrata sulle allucinazioni in inglese, il nostro studio estende questa indagine ai dati conversazionali in tre lingue: hindi, farsi e mandarino. Offriamo un'analisi completa di un dataset per esaminare sia gli errori fattuali che linguistici in queste lingue per GPT-3.5, GPT-4o, Llama-3.1, Gemma-2.0, DeepSeek-R1 e Qwen-3. Abbiamo riscontrato che gli LLM producono pochissime risposte allucinate in mandarino, ma generano un numero significativamente maggiore di allucinazioni in hindi e farsi.

English

Large Language Models (LLMs) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as 'hallucination'. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs. While much research has focused on hallucinations in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5, GPT-4o, Llama-3.1, Gemma-2.0, DeepSeek-R1 and Qwen-3. We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucinations in Hindi and Farsi.

Indagine sull'Allucinazione nelle Conversazioni per Lingue con Risorse Limitare

Investigating Hallucination in Conversations for Low Resource Languages

Abstract

Support