ChatPaper.aiChatPaper

探索 GPT-4 在放射學領域的界限

Exploring the Boundaries of GPT-4 in Radiology

October 23, 2023
作者: Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn, Harshita Sharma, Fernando Pérez-García, Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya V. Nori, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle
cs.AI

摘要

近期通用領域大型語言模型(LLMs)的成功顯著改變了自然語言處理範式,走向跨領域和應用的統一基礎模型。本文專注於評估迄今為止最具能力的LLM,即GPT-4,在基於文本的放射學報告應用上的表現,並與最先進的放射學特定模型進行比較。通過探索各種提示策略,我們評估了GPT-4在各種常見放射學任務上的表現,發現GPT-4在時間句子相似性分類(準確性)和自然語言推理(F_1)方面要優於或與當前最先進的放射學模型相當。對於需要學習特定數據集風格或架構(例如發現總結)的任務,GPT-4通過基於示例的提示進行改進,並與監督式最先進模型相匹配。我們與一位獲得認證的放射科醫師進行了廣泛的錯誤分析,結果顯示GPT-4在放射學知識方面具有足夠水平,僅在需要微妙領域知識的複雜情境中偶爾出現錯誤。對於發現總結,發現GPT-4的輸出整體上與現有手工撰寫的印象相當。
English
The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-specific models. Exploring various prompting strategies, we evaluated GPT-4 on a diverse range of common radiology tasks and we found GPT-4 either outperforms or is on par with current SOTA radiology models. With zero-shot prompting, GPT-4 already obtains substantial gains (approx 10% absolute improvement) over radiology models in temporal sentence similarity classification (accuracy) and natural language inference (F_1). For tasks that require learning dataset-specific style or schema (e.g. findings summarisation), GPT-4 improves with example-based prompting and matches supervised SOTA. Our extensive error analysis with a board-certified radiologist shows GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context that require nuanced domain knowledge. For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
PDF92December 15, 2024