RaTEScore: 放射線レポート生成のための評価指標

要旨

本論文では、AIモデルによって生成された医療報告書の品質を評価するための新しいエンティティ認識指標「Radiological Report (Text) Evaluation (RaTEScore)」を提案する。RaTEScoreは、診断結果や解剖学的詳細などの重要な医療エンティティを重視し、複雑な医学的同義語に対して頑健であり、否定表現に対して敏感である。技術的には、包括的な医療固有表現認識（NER）データセット「RaTE-NER」を開発し、この目的のために特化したNERモデルを訓練した。このモデルにより、複雑な放射線学的報告書を構成する医療エンティティに分解することが可能となる。指標自体は、言語モデルから得られたエンティティ埋め込みの類似性を、そのタイプと臨床的意義に基づいて比較することで導出される。評価結果は、RaTEScoreが既存の指標よりも人間の選好に近いことを示しており、これは確立された公開ベンチマークと新たに提案したRaTE-Evalベンチマークの両方で検証されている。

English

This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models. RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Technically, we developed a comprehensive medical NER dataset, RaTE-NER, and trained an NER model specifically for this purpose. This model enables the decomposition of complex radiological reports into constituent medical entities. The metric itself is derived by comparing the similarity of entity embeddings, obtained from a language model, based on their types and relevance to clinical significance. Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.

RaTEScore: 放射線レポート生成のための評価指標

RaTEScore: A Metric for Radiology Report Generation

要旨

Support