ChatPaper.aiChatPaper

通过检索到的上下文来增强医疗LLMs

Boosting Healthcare LLMs Through Retrieved Context

September 23, 2024
作者: Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Dario Garcia-Gasulla
cs.AI

摘要

大型语言模型(LLMs)在自然语言处理方面展现出卓越的能力,然而,它们的事实不准确和幻觉限制了它们的应用,特别是在诸如医疗保健等关键领域。通过引入相关信息作为输入,上下文检索方法已成为增强LLM事实性和可靠性的关键方法。本研究探讨了上下文检索方法在医疗保健领域的边界,优化了它们的组件,并将它们的性能与开放和封闭的替代方案进行了基准测试。我们的研究结果显示,当使用经过优化的检索系统增强的开放LLMs时,可以在已建立的医疗保健基准测试(多项选择题回答)上实现与最大的私有解决方案相媲美的性能。我们认识到在问题中包含可能答案的缺乏现实性(这种设置仅在医学考试中找到),并在评估到在没有这些选项的情况下强大的LLM性能下降后,我们将上下文检索系统扩展到这个方向。具体而言,我们提出了OpenMedPrompt,这是一个改进更可靠的开放式答案生成的流程,将这项技术更接近实际应用。
English
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing, and yet, their factual inaccuracies and hallucinations limits their application, particularly in critical domains like healthcare. Context retrieval methods, by introducing relevant information as input, have emerged as a crucial approach for enhancing LLM factuality and reliability. This study explores the boundaries of context retrieval methods within the healthcare domain, optimizing their components and benchmarking their performance against open and closed alternatives. Our findings reveal how open LLMs, when augmented with an optimized retrieval system, can achieve performance comparable to the biggest private solutions on established healthcare benchmarks (multiple-choice question answering). Recognizing the lack of realism of including the possible answers within the question (a setup only found in medical exams), and after assessing a strong LLM performance degradation in the absence of those options, we extend the context retrieval system in that direction. In particular, we propose OpenMedPrompt a pipeline that improves the generation of more reliable open-ended answers, moving this technology closer to practical application.

Summary

AI-Generated Summary

PDF212November 16, 2024