通过基于事实的归因和学习拒绝来衡量和增强RAG中LLMs的可信度。

摘要

LLM是检索增强生成（RAG）系统中不可或缺的一部分。虽然许多研究侧重于评估端到端RAG系统的质量，但对LLM在RAG任务中的适用性缺乏研究。因此，我们引入了一个新的度量标准，Trust-Score，提供了对LLM在RAG框架中可信度的全面评估。我们展示了各种提示方法，如上下文学习，未能有效地使LLM适应RAG任务。因此，我们提出了Trust-Align，一个用于使LLM对齐以获得更高Trust-Score的框架。与我们的方法对齐的LLaMA-3-8b，在ASQA（提高10.7）、QAMPARI（提高29.2）和ELI5（提高14.9）上显著优于开源具有相似规模的LLM。我们在以下网址发布了我们的代码：https://github.com/declare-lab/trust-align。

English

LLMs are an integral part of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the quality of end-to-end RAG systems, there is a lack of research on understanding the appropriateness of an LLM for the RAG task. Thus, we introduce a new metric, Trust-Score, that provides a holistic evaluation of the trustworthiness of LLMs in an RAG framework. We show that various prompting methods, such as in-context learning, fail to adapt LLMs effectively to the RAG task. Thus, we propose Trust-Align, a framework to align LLMs for higher Trust-Score. LLaMA-3-8b, aligned with our method, significantly outperforms open-source LLMs of comparable sizes on ASQA (up 10.7), QAMPARI (up 29.2) and ELI5 (up 14.9). We release our code at: https://github.com/declare-lab/trust-align.

通过基于事实的归因和学习拒绝来衡量和增强RAG中LLMs的可信度。

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

摘要

Support