ChatPaper.aiChatPaper

探索 GPT-4 在放射学领域的边界

Exploring the Boundaries of GPT-4 in Radiology

October 23, 2023
作者: Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn, Harshita Sharma, Fernando Pérez-García, Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya V. Nori, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle
cs.AI

摘要

最近通用领域大型语言模型(LLMs)的成功显著改变了自然语言处理范式,朝向跨领域和应用的统一基础模型。本文重点评估迄今为止最具能力的LLM,即GPT-4,在基于文本的放射学报告应用中的性能,与最先进的放射学特定模型进行比较。通过探索各种提示策略,我们评估了GPT-4在各种常见放射学任务上的表现,发现GPT-4在时间句子相似性分类(准确度)和自然语言推理(F_1)方面要么优于,要么与当前SOTA放射学模型持平。对于需要学习数据集特定风格或模式(例如,发现摘要)的任务,GPT-4通过基于示例的提示得到改进,并与监督式SOTA相匹配。我们与一位获得执照的放射科医生进行了广泛的错误分析,结果显示GPT-4在放射学知识方面具有足够水平,仅在需要微妙领域知识的复杂背景下偶尔出现错误。对于发现摘要,发现GPT-4的输出总体上与现有手工撰写的印象相当。
English
The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-specific models. Exploring various prompting strategies, we evaluated GPT-4 on a diverse range of common radiology tasks and we found GPT-4 either outperforms or is on par with current SOTA radiology models. With zero-shot prompting, GPT-4 already obtains substantial gains (approx 10% absolute improvement) over radiology models in temporal sentence similarity classification (accuracy) and natural language inference (F_1). For tasks that require learning dataset-specific style or schema (e.g. findings summarisation), GPT-4 improves with example-based prompting and matches supervised SOTA. Our extensive error analysis with a board-certified radiologist shows GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context that require nuanced domain knowledge. For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
PDF92December 15, 2024