μ^2Tokenizer: 방사선 보고서 생성을 위한 미분 가능한 다중 스케일 다중 모달 토크나이저

초록

자동화된 방사선학 보고서 생성(RRG)은 컴퓨터 단층촬영(CT) 스캔과 같은 임상 영상으로부터 상세한 텍스트 보고서를 생성하여 진단의 정확성과 효율성을 향상시키고 관리 조언을 제공하는 것을 목표로 합니다. RRG는 두 가지 주요 과제로 인해 복잡합니다: (1) 자원 제약 하에서 영상 데이터로부터 관련 정보를 추출하는 데 있어 내재된 복잡성, (2) 모델 생성 보고서와 전문가 작성 보고서 간의 차이를 객관적으로 평가하는 데 어려움. 이러한 과제를 해결하기 위해, 우리는 RRG 작업을 위한 **mu**ltiscale **mu**ltimodal 대규모 언어 모델(mu^2LLM)을 제안합니다. 새로운 {mu}^2Tokenizer는 중간 계층으로서, 다중 스케일 시각 토크나이저와 텍스트 토크나이저로부터 다중 모달 특징을 통합하고, GREEN-RedLlama의 지도 하에 직접 선호 최적화(DPO)를 통해 보고서 생성 품질을 향상시킵니다. 네 개의 대규모 CT 영상-보고서 의료 데이터셋에 대한 실험 결과는 우리의 방법이 기존 접근법을 능가하며, 제한된 데이터에 대해 미세 조정된 mu^2LLM의 RRG 작업에서의 잠재력을 강조합니다.

English

Automated radiology report generation (RRG) aims to produce detailed textual reports from clinical imaging, such as computed tomography (CT) scans, to improve the accuracy and efficiency of diagnosis and provision of management advice. RRG is complicated by two key challenges: (1) inherent complexity in extracting relevant information from imaging data under resource constraints, and (2) difficulty in objectively evaluating discrepancies between model-generated and expert-written reports. To address these challenges, we propose mu^2LLM, a textbf{mu}ltiscale textbf{mu}ltimodal large language models for RRG tasks. The novel {mu}^2Tokenizer, as an intermediate layer, integrates multi-modal features from the multiscale visual tokenizer and the text tokenizer, then enhances report generation quality through direct preference optimization (DPO), guided by GREEN-RedLlama. Experimental results on four large CT image-report medical datasetdemonstrate that our method outperforms existing approaches, highlighting the potential of our fine-tuned mu^2LLMs on limited data for RRG tasks.

μ^2Tokenizer: 방사선 보고서 생성을 위한 미분 가능한 다중 스케일 다중 모달 토크나이저

μ^2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

초록

Support