ChatPaper.aiChatPaper

BordIRlines:用于评估跨语言检索增强生成的数据集

BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation

October 2, 2024
作者: Bryan Li, Samar Haider, Fiona Luo, Adwait Agashe, Chris Callison-Burch
cs.AI

摘要

大型语言模型在创造性生成方面表现出色,但仍然在幻觉和偏见问题上面临挑战。检索增强生成(RAG)提供了一个框架,用于将大型语言模型的回应基于准确和最新的信息,但仍然引发了偏见问题:应选择哪些来源包含在上下文中?它们的重要性如何加权?本文研究了跨语言RAG的挑战,并提出了一个数据集,用于研究现有系统在回答有关地缘政治争端的查询时的鲁棒性,这些争端存在于语言、文化和政治边界的交汇处。我们的数据集来自包含与给定查询相关信息的维基百科页面,我们调查了包含额外上下文的影响,以及这种上下文在语言和来源方面的构成对大型语言模型回应的影响。我们的结果显示,现有的RAG系统在跨语言使用情况下仍然面临挑战,并且在提供多种语言的竞争信息时缺乏一致性。我们提供案例研究以阐明这些问题,并概述未来研究应采取的步骤来解决这些挑战。我们在https://github.com/manestay/bordIRlines上公开提供我们的数据集和代码。
English
Large language models excel at creative generation but continue to struggle with the issues of hallucination and bias. While retrieval-augmented generation (RAG) provides a framework for grounding LLMs' responses in accurate and up-to-date information, it still raises the question of bias: which sources should be selected for inclusion in the context? And how should their importance be weighted? In this paper, we study the challenge of cross-lingual RAG and present a dataset to investigate the robustness of existing systems at answering queries about geopolitical disputes, which exist at the intersection of linguistic, cultural, and political boundaries. Our dataset is sourced from Wikipedia pages containing information relevant to the given queries and we investigate the impact of including additional context, as well as the composition of this context in terms of language and source, on an LLM's response. Our results show that existing RAG systems continue to be challenged by cross-lingual use cases and suffer from a lack of consistency when they are provided with competing information in multiple languages. We present case studies to illustrate these issues and outline steps for future research to address these challenges. We make our dataset and code publicly available at https://github.com/manestay/bordIRlines.

Summary

AI-Generated Summary

PDF64November 16, 2024