ChatPaper.aiChatPaper

MedFuzz:探索大型语言模型在医学问答中的鲁棒性

MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

June 3, 2024
作者: Robert Osazuwa Ness, Katie Matton, Hayden Helm, Sheng Zhang, Junaid Bajwa, Carey E. Priebe, Eric Horvitz
cs.AI

摘要

大型语言模型(LLM)在医学问答基准上取得了令人印象深刻的表现。然而,高基准准确性并不意味着性能可以泛化到真实临床环境中。医学问答基准依赖于与量化LLM性能一致的假设,但这些假设在临床开放世界中可能不成立。然而,LLM学习了广泛的知识,可以帮助LLM泛化到实际条件,而不受庆祝基准中不切实际假设的影响。我们试图量化LLM医学问答基准性能在违反基准假设时的泛化能力。具体而言,我们提出了一种称为MedFuzz(医学模糊化)的对抗方法。MedFuzz试图以混淆LLM为目的修改基准问题。我们通过针对MedQA基准中关于患者特征的强假设展示了这一方法。成功的“攻击”以一种不太可能愚弄医学专家但仍“欺骗”LLM使其从正确答案变为错误答案的方式修改基准项目。此外,我们提出了一种排列测试技术,可以确保成功的攻击在统计上显著。我们展示了如何利用“MedFuzzed”基准的性能,以及单个成功的攻击。这些方法显示出在更现实的环境中提供LLM稳健运行能力洞察的潜力。
English
Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to real-world clinical settings. Medical question-answering benchmarks rely on assumptions consistent with quantifying LLM performance but that may not hold in the open world of the clinic. Yet LLMs learn broad knowledge that can help the LLM generalize to practical conditions regardless of unrealistic assumptions in celebrated benchmarks. We seek to quantify how well LLM medical question-answering benchmark performance generalizes when benchmark assumptions are violated. Specifically, we present an adversarial method that we call MedFuzz (for medical fuzzing). MedFuzz attempts to modify benchmark questions in ways aimed at confounding the LLM. We demonstrate the approach by targeting strong assumptions about patient characteristics presented in the MedQA benchmark. Successful "attacks" modify a benchmark item in ways that would be unlikely to fool a medical expert but nonetheless "trick" the LLM into changing from a correct to an incorrect answer. Further, we present a permutation test technique that can ensure a successful attack is statistically significant. We show how to use performance on a "MedFuzzed" benchmark, as well as individual successful attacks. The methods show promise at providing insights into the ability of an LLM to operate robustly in more realistic settings.

Summary

AI-Generated Summary

PDF110December 8, 2024