どうしてそれを知っているのですか？生物医学的質問に対する回答を参照する生成言語モデルの教育

要旨

大規模言語モデル（LLM）は、最近、オンライン上でのユーザーの質問に対する主要な回答源となっています。流暢な回答を提供する能力がある一方で、その正確性と信頼性には重大な課題が存在します。これは特に、事実に基づいた正確な回答がより求められる生物医学のような敏感な分野において顕著です。本論文では、生成された回答の信頼性を向上させるために設計された生物医学的検索拡張生成（RAG）システムを紹介します。このシステムは、参照型質問応答用にファインチューニングされたLLMを基盤としており、PubMedから検索された関連するアブストラクトがプロンプトを通じてLLMのコンテキストに入力として渡されます。その出力は、PubMedのアブストラクトに基づいた回答であり、各記述は適切に参照されているため、ユーザーは回答を検証することができます。我々の検索システムは、PubMed検索エンジンと比較して23%の絶対的な改善を達成しています。小規模なサンプルに対する手動評価に基づくと、我々のファインチューニングされたLLMコンポーネントは、関連するアブストラクトを参照する点においてGPT-4 Turboと同等の結果を達成しています。我々は、モデルのファインチューニングに使用されたデータセットと、Mistral-7B-instruct-v0.1およびv0.2に基づくファインチューニングされたモデルを公開しています。

English

Large language models (LLMs) have recently become the leading source of answers for users' questions online. Despite their ability to offer eloquent answers, their accuracy and reliability can pose a significant challenge. This is especially true for sensitive domains such as biomedicine, where there is a higher need for factually correct answers. This paper introduces a biomedical retrieval-augmented generation (RAG) system designed to enhance the reliability of generated responses. The system is based on a fine-tuned LLM for the referenced question-answering, where retrieved relevant abstracts from PubMed are passed to LLM's context as input through a prompt. Its output is an answer based on PubMed abstracts, where each statement is referenced accordingly, allowing the users to verify the answer. Our retrieval system achieves an absolute improvement of 23% compared to the PubMed search engine. Based on the manual evaluation on a small sample, our fine-tuned LLM component achieves comparable results to GPT-4 Turbo in referencing relevant abstracts. We make the dataset used to fine-tune the models and the fine-tuned models based on Mistral-7B-instruct-v0.1 and v0.2 publicly available.

どうしてそれを知っているのですか？生物医学的質問に対する回答を参照する生成言語モデルの教育

How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions

要旨

Support