CCD：通過臨床對比解碼減輕放射學多模態大語言模型的幻覺現象

摘要

多模態大型語言模型（MLLMs）近期在放射學領域取得了顯著進展，通過將視覺感知與自然語言理解相結合。然而，這些模型常生成缺乏臨床支持的描述，即所謂的醫學幻覺，這在要求精確性和基於影像輸出的醫學應用中構成嚴重風險。通過實證分析，我們發現提示誘導的幻覺在放射學MLLMs中仍然普遍存在，主要源於對臨床部分的過度敏感。為解決此問題，我們提出了臨床對比解碼（CCD），這是一種無需訓練且無需檢索的推理框架，整合了來自特定任務放射學專家模型的結構化臨床信號。CCD引入了一種雙階段對比機制，在生成過程中精煉詞元級別的對數概率，從而提升臨床保真度，而無需修改基礎MLLM。在三個數據集和多個模型上的實驗表明，CCD在放射學報告生成（RRG）上持續提升整體性能。在MIMIC-CXR數據集上，當應用於最先進的RRG模型時，其在RadGraph-F1指標上最高提升了17%。我們的方法提供了一種輕量級且可泛化的解決方案，用於緩解醫學幻覺，有效地在放射學中橋接專家模型與MLLMs。

English

Multimodal large language models (MLLMs) have recently achieved remarkable progress in radiology by integrating visual perception with natural language understanding. However, they often generate clinically unsupported descriptions, known as medical hallucinations, which pose serious risks in medical applications that demand accuracy and image-grounded outputs. Through empirical analysis, we find that prompt-induced hallucinations remain prevalent in radiology MLLMs, largely due to over-sensitivity to clinical sections. To address this, we introduce Clinical Contrastive Cecoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task-specific radiology expert models. CCD introduces a dual-stage contrastive mechanism to refine token-level logits during generation, thereby enhancing clinical fidelity without modifying the base MLLM. Experiments on three datasets and multiple models demonstrate that CCD consistently improves overall performance on radiology report generation (RRG). On the MIMIC-CXR dataset, it yields up to a 17% improvement in RadGraph-F1 when applied to state-of-the-art RRG models. Our approach provides a lightweight and generalisable solution for mitigating medical hallucinations, effectively bridging expert models and MLLMs in radiology.

CCD：通過臨床對比解碼減輕放射學多模態大語言模型的幻覺現象

CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

摘要

Support