RaTEScore：一种用于放射学报告生成的度量标准

摘要

本文介绍了一种新颖的实体感知度量，称为放射学报告（文本）评估（RaTEScore），用于评估人工智能模型生成的医学报告的质量。RaTEScore强调诊断结果和解剖细节等关键医学实体，并且对复杂的医学同义词具有鲁棒性，对否定表达敏感。从技术上讲，我们开发了一个全面的医学命名实体识别（NER）数据集，RaTE-NER，并专门为此目的训练了一个NER模型。该模型能够将复杂的放射学报告分解为组成的医学实体。该度量本身是通过比较从语言模型获得的实体嵌入的相似性来导出的，这些实体嵌入基于它们的类型和与临床重要性的相关性。我们的评估表明，RaTEScore与现有度量更接近人类偏好，经过在已建立的公共基准测试和我们新提出的RaTE-Eval基准测试上验证。

English

This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models. RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Technically, we developed a comprehensive medical NER dataset, RaTE-NER, and trained an NER model specifically for this purpose. This model enables the decomposition of complex radiological reports into constituent medical entities. The metric itself is derived by comparing the similarity of entity embeddings, obtained from a language model, based on their types and relevance to clinical significance. Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.

RaTEScore：一种用于放射学报告生成的度量标准

RaTEScore: A Metric for Radiology Report Generation

摘要

Support