MIRIAD：以數百萬醫療問答對增強大型語言模型

摘要

大型語言模型（LLMs）必將通過先進的決策支持和靈活的聊天助手來改變醫療保健。然而，LLMs容易生成不準確的醫療內容。為了將LLMs基於高質量的醫學知識，LLMs已通過檢索增強生成（RAG）配備了外部知識，其中非結構化的醫學知識被分割成小文本塊，可以選擇性地檢索並整合到LLMs的上下文中。然而，現有的RAG管道依賴於原始、非結構化的醫學文本，這些文本可能嘈雜、未經整理，且難以讓LLMs有效利用。系統性地組織醫學知識以最佳方式呈現給LLMs的方法普遍缺乏。為應對這些挑戰，我們引入了MIRIAD，這是一個大規模、經過整理的包含5,821,948個醫學問答對的語料庫，每個問答對都是從同行評審的醫學文獻中重新表述並基於其段落，使用結合LLM生成、過濾、基於和人工註釋的半自動化管道。與依賴非結構化文本的先前醫學語料庫不同，MIRIAD以操作化的查詢-響應格式封裝了網絡規模的醫學知識，這使得檢索更加有針對性。在具有挑戰性的醫學問答基準測試上的實驗表明，與使用相同源語料庫和相同檢索文本量的非結構化RAG基線相比，使用MIRIAD增強LLMs的準確性提高了高達6.7%。此外，MIRIAD將LLMs檢測醫學幻覺的能力提高了22.5%至37%（F1分數的增加）。我們進一步引入了MIRIAD-Atlas，這是一個涵蓋56個醫學學科的MIRIAD互動地圖，使臨床用戶能夠視覺化探索、搜索和精煉醫學知識。MIRIAD有望解鎖大量下游應用，包括醫學信息檢索器、增強型RAG應用程序和基於知識的聊天界面，最終在醫療保健中實現更可靠的LLM應用。

English

LLMs are bound to transform healthcare with advanced decision support and flexible chat assistants. However, LLMs are prone to generate inaccurate medical content. To ground LLMs in high-quality medical knowledge, LLMs have been equipped with external knowledge via RAG, where unstructured medical knowledge is split into small text chunks that can be selectively retrieved and integrated into the LLMs context. Yet, existing RAG pipelines rely on raw, unstructured medical text, which can be noisy, uncurated and difficult for LLMs to effectively leverage. Systematic approaches to organize medical knowledge to best surface it to LLMs are generally lacking. To address these challenges, we introduce MIRIAD, a large-scale, curated corpus of 5,821,948 medical QA pairs, each rephrased from and grounded in a passage from peer-reviewed medical literature using a semi-automated pipeline combining LLM generation, filtering, grounding, and human annotation. Unlike prior medical corpora, which rely on unstructured text, MIRIAD encapsulates web-scale medical knowledge in an operationalized query-response format, which enables more targeted retrieval. Experiments on challenging medical QA benchmarks show that augmenting LLMs with MIRIAD improves accuracy up to 6.7% compared to unstructured RAG baselines with the same source corpus and with the same amount of retrieved text. Moreover, MIRIAD improved the ability of LLMs to detect medical hallucinations by 22.5 to 37% (increase in F1 score). We further introduce MIRIAD-Atlas, an interactive map of MIRIAD spanning 56 medical disciplines, enabling clinical users to visually explore, search, and refine medical knowledge. MIRIAD promises to unlock a wealth of down-stream applications, including medical information retrievers, enhanced RAG applications, and knowledge-grounded chat interfaces, which ultimately enables more reliable LLM applications in healthcare.