MIRIAD:通过数百万医疗问答对增强大型语言模型
MIRIAD: Augmenting LLMs with millions of medical query-response pairs
June 6, 2025
作者: Qinyue Zheng, Salman Abdullah, Sam Rawal, Cyril Zakka, Sophie Ostmeier, Maximilian Purk, Eduardo Reis, Eric J. Topol, Jure Leskovec, Michael Moor
cs.AI
摘要
大型语言模型(LLMs)必将通过先进的决策支持和灵活的聊天助手彻底改变医疗保健领域。然而,LLMs容易生成不准确的医疗内容。为了使LLMs基于高质量的医学知识,人们通过检索增强生成(RAG)为其配备了外部知识,其中非结构化的医学知识被分割成小块文本,可以选择性地检索并整合到LLMs的上下文中。然而,现有的RAG管道依赖于原始的、非结构化的医学文本,这些文本可能包含噪声、未经整理,且难以被LLMs有效利用。目前,缺乏系统化的方法来组织医学知识,以便最好地呈现给LLMs。为了解决这些挑战,我们引入了MIRIAD,这是一个大规模、经过精心策划的语料库,包含5,821,948个医学问答对,每个问答对均通过半自动化流程从同行评审的医学文献中提取并重新表述,结合了LLM生成、过滤、锚定和人工注释。与以往依赖非结构化文本的医学语料库不同,MIRIAD以可操作的查询-响应格式封装了网络规模的医学知识,从而实现了更有针对性的检索。在具有挑战性的医学问答基准测试中,实验表明,与使用相同源语料库和相同数量检索文本的非结构化RAG基线相比,使用MIRIAD增强的LLMs准确率提高了高达6.7%。此外,MIRIAD将LLMs检测医学幻觉的能力提高了22.5%至37%(F1分数提升)。我们还引入了MIRIAD-Atlas,这是一个涵盖56个医学学科的交互式地图,使临床用户能够直观地探索、搜索和精炼医学知识。MIRIAD有望解锁大量下游应用,包括医学信息检索器、增强的RAG应用以及基于知识的聊天界面,最终在医疗保健领域实现更可靠的LLM应用。
English
LLMs are bound to transform healthcare with advanced decision support and
flexible chat assistants. However, LLMs are prone to generate inaccurate
medical content. To ground LLMs in high-quality medical knowledge, LLMs have
been equipped with external knowledge via RAG, where unstructured medical
knowledge is split into small text chunks that can be selectively retrieved and
integrated into the LLMs context. Yet, existing RAG pipelines rely on raw,
unstructured medical text, which can be noisy, uncurated and difficult for LLMs
to effectively leverage. Systematic approaches to organize medical knowledge to
best surface it to LLMs are generally lacking. To address these challenges, we
introduce MIRIAD, a large-scale, curated corpus of 5,821,948 medical QA pairs,
each rephrased from and grounded in a passage from peer-reviewed medical
literature using a semi-automated pipeline combining LLM generation, filtering,
grounding, and human annotation. Unlike prior medical corpora, which rely on
unstructured text, MIRIAD encapsulates web-scale medical knowledge in an
operationalized query-response format, which enables more targeted retrieval.
Experiments on challenging medical QA benchmarks show that augmenting LLMs with
MIRIAD improves accuracy up to 6.7% compared to unstructured RAG baselines with
the same source corpus and with the same amount of retrieved text. Moreover,
MIRIAD improved the ability of LLMs to detect medical hallucinations by 22.5 to
37% (increase in F1 score). We further introduce MIRIAD-Atlas, an interactive
map of MIRIAD spanning 56 medical disciplines, enabling clinical users to
visually explore, search, and refine medical knowledge. MIRIAD promises to
unlock a wealth of down-stream applications, including medical information
retrievers, enhanced RAG applications, and knowledge-grounded chat interfaces,
which ultimately enables more reliable LLM applications in healthcare.Summary
AI-Generated Summary