ChatPaper.aiChatPaper

你是怎么知道的?教导生成式语言模型引用生物医学问题的答案

How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions

July 6, 2024
作者: Bojana Bašaragin, Adela Ljajić, Darija Medvecki, Lorenzo Cassano, Miloš Košprdić, Nikola Milošević
cs.AI

摘要

大型语言模型(LLMs)最近已成为用户在线提问的主要答案来源。尽管它们能够提供流畅的答案,但其准确性和可靠性可能构成重大挑战。这在生物医学等敏感领域尤为明显,因为这些领域对事实正确性的需求更高。本文介绍了一种生物医学检索增强生成(RAG)系统,旨在提高生成响应的可靠性。该系统基于一个经过微调的LLM用于参考问答,从PubMed检索到的相关摘要通过提示作为输入传递给LLM的上下文。其输出是基于PubMed摘要的答案,其中每个陈述都有相应的参考,使用户能够验证答案。我们的检索系统相较于PubMed搜索引擎实现了23%的绝对改进。基于对小样本的手动评估,我们的经过微调的LLM组件在引用相关摘要方面实现了与GPT-4 Turbo可比的结果。我们公开了用于微调模型的数据集以及基于Mistral-7B-instruct-v0.1和v0.2的经过微调模型。
English
Large language models (LLMs) have recently become the leading source of answers for users' questions online. Despite their ability to offer eloquent answers, their accuracy and reliability can pose a significant challenge. This is especially true for sensitive domains such as biomedicine, where there is a higher need for factually correct answers. This paper introduces a biomedical retrieval-augmented generation (RAG) system designed to enhance the reliability of generated responses. The system is based on a fine-tuned LLM for the referenced question-answering, where retrieved relevant abstracts from PubMed are passed to LLM's context as input through a prompt. Its output is an answer based on PubMed abstracts, where each statement is referenced accordingly, allowing the users to verify the answer. Our retrieval system achieves an absolute improvement of 23% compared to the PubMed search engine. Based on the manual evaluation on a small sample, our fine-tuned LLM component achieves comparable results to GPT-4 Turbo in referencing relevant abstracts. We make the dataset used to fine-tune the models and the fine-tuned models based on Mistral-7B-instruct-v0.1 and v0.2 publicly available.

Summary

AI-Generated Summary

PDF41November 28, 2024