저자원 언어 대화에서의 환각 현상 연구

초록

대형 언어 모델(LLM)은 인간의 글쓰기와 매우 유사한 텍스트를 생성하는 데 있어 뛰어난 능력을 보여주고 있다. 그러나 이러한 모델들은 종종 사실과 다른 오류를 생성하는데, 이 문제는 일반적으로 '환각(hallucination)'이라고 불린다. 환각 문제를 해결하는 것은 LLM의 신뢰성과 효과성을 높이는 데 있어 매우 중요하다. 기존 연구는 주로 영어에서의 환각 현상에 초점을 맞추었으나, 본 연구는 힌디어, 페르시아어, 중국어 등 세 가지 언어의 대화 데이터로 이 조사를 확장하였다. 우리는 GPT-3.5, GPT-4o, Llama-3.1, Gemma-2.0, DeepSeek-R1 및 Qwen-3 모델을 대상으로 이들 언어에서의 사실적 오류와 언어적 오류를 종합적으로 분석하였다. 연구 결과, LLM은 중국어에서는 매우 적은 수의 환각 응답을 생성한 반면, 힌디어와 페르시아어에서는 상당히 많은 수의 환각 응답을 생성하는 것으로 나타났다.

English

Large Language Models (LLMs) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as 'hallucination'. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs. While much research has focused on hallucinations in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5, GPT-4o, Llama-3.1, Gemma-2.0, DeepSeek-R1 and Qwen-3. We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucinations in Hindi and Farsi.

저자원 언어 대화에서의 환각 현상 연구

Investigating Hallucination in Conversations for Low Resource Languages

초록

Support