RaTEScore:一种用于放射学报告生成的度量标准
RaTEScore: A Metric for Radiology Report Generation
June 24, 2024
作者: Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
cs.AI
摘要
本文介绍了一种新颖的实体感知度量,称为放射学报告(文本)评估(RaTEScore),用于评估人工智能模型生成的医学报告的质量。RaTEScore强调诊断结果和解剖细节等关键医学实体,并且对复杂的医学同义词具有鲁棒性,对否定表达敏感。从技术上讲,我们开发了一个全面的医学命名实体识别(NER)数据集,RaTE-NER,并专门为此目的训练了一个NER模型。该模型能够将复杂的放射学报告分解为组成的医学实体。该度量本身是通过比较从语言模型获得的实体嵌入的相似性来导出的,这些实体嵌入基于它们的类型和与临床重要性的相关性。我们的评估表明,RaTEScore与现有度量更接近人类偏好,经过在已建立的公共基准测试和我们新提出的RaTE-Eval基准测试上验证。
English
This paper introduces a novel, entity-aware metric, termed as Radiological
Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports
generated by AI models. RaTEScore emphasizes crucial medical entities such as
diagnostic outcomes and anatomical details, and is robust against complex
medical synonyms and sensitive to negation expressions. Technically, we
developed a comprehensive medical NER dataset, RaTE-NER, and trained an NER
model specifically for this purpose. This model enables the decomposition of
complex radiological reports into constituent medical entities. The metric
itself is derived by comparing the similarity of entity embeddings, obtained
from a language model, based on their types and relevance to clinical
significance. Our evaluations demonstrate that RaTEScore aligns more closely
with human preference than existing metrics, validated both on established
public benchmarks and our newly proposed RaTE-Eval benchmark.Summary
AI-Generated Summary