你是怎麼知道的?教導生成式語言模型參考生物醫學問題的答案
How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions
July 6, 2024
作者: Bojana Bašaragin, Adela Ljajić, Darija Medvecki, Lorenzo Cassano, Miloš Košprdić, Nikola Milošević
cs.AI
摘要
大型語言模型(LLMs)最近已成為線上用戶問答的主要來源。儘管它們能提供流暢的答案,但其準確性和可靠性可能構成重大挑戰。這在生物醫學等敏感領域尤為明顯,因為對事實正確答案的需求更高。本文介紹了一個生物醫學檢索增強生成(RAG)系統,旨在提高生成回答的可靠性。該系統基於一個經過微調的LLM進行問答,通過提示將從PubMed檢索的相關摘要作為輸入傳遞給LLM的上下文。其輸出是基於PubMed摘要的答案,每個陳述均有相應參考,讓用戶可以驗證答案。我們的檢索系統相較於PubMed搜索引擎實現了23%的絕對改善。根據對一小樣本的手動評估,我們的經過微調的LLM組件在參考相關摘要方面與GPT-4 Turbo達到可比的結果。我們將用於微調模型的數據集以及基於Mistral-7B-instruct-v0.1和v0.2進行微調的模型公開提供。
English
Large language models (LLMs) have recently become the leading source of
answers for users' questions online. Despite their ability to offer eloquent
answers, their accuracy and reliability can pose a significant challenge. This
is especially true for sensitive domains such as biomedicine, where there is a
higher need for factually correct answers. This paper introduces a biomedical
retrieval-augmented generation (RAG) system designed to enhance the reliability
of generated responses. The system is based on a fine-tuned LLM for the
referenced question-answering, where retrieved relevant abstracts from PubMed
are passed to LLM's context as input through a prompt. Its output is an answer
based on PubMed abstracts, where each statement is referenced accordingly,
allowing the users to verify the answer. Our retrieval system achieves an
absolute improvement of 23% compared to the PubMed search engine. Based on the
manual evaluation on a small sample, our fine-tuned LLM component achieves
comparable results to GPT-4 Turbo in referencing relevant abstracts. We make
the dataset used to fine-tune the models and the fine-tuned models based on
Mistral-7B-instruct-v0.1 and v0.2 publicly available.Summary
AI-Generated Summary