CheXWorld: 방사선 영상 표현을 위한 이미지 월드 모델링 탐구 학습

초록

인간은 상식 지식을 인코딩하는 내부 세계 모델을 개발할 수 있으며, 이를 통해 세계가 어떻게 작동하는지 이해하고 자신의 행동 결과를 예측할 수 있습니다. 이 개념은 최근 초기 연구들, 예를 들어 시각 표현 학습에서 범용 머신러닝 모델을 구축하기 위한 유망한 방향으로 부상했습니다. 본 논문에서는 방사선 이미지를 위한 자기 지도 세계 모델인 CheXWorld를 최초로 제안합니다. 구체적으로, 우리의 연구는 자격을 갖춘 방사선 전문의에게 필수적인 의학 지식의 세 가지 측면을 동시에 모델링하는 통합 프레임워크를 개발합니다. 이는 1) 국소 조직의 세밀한 특성(예: 구조, 형태, 질감)을 설명하는 국소 해부학적 구조, 2) 인체의 전역적 조직(예: 장기와 골격의 배치)을 설명하는 전역 해부학적 배치, 그리고 3) CheXWorld가 방사선 사진의 다양한 외관 도메인 간 전이를 모델링하도록 장려하는 도메인 변이(예: 다른 병원, 장치 또는 환자로부터 수집된 방사선 사진으로 인한 선명도, 대비, 노출의 변화)를 포함합니다. 실증적으로, 우리는 맞춤형 정성적 및 정량적 분석을 설계하여 CheXWorld가 이 세 가지 차원의 의학 지식을 성공적으로 포착함을 보여줍니다. 더 나아가, 8개의 의료 이미지 분류 및 세분화 벤치마크에서의 전이 학습 실험은 CheXWorld가 기존의 자기 지도 학습 방법과 대규모 의료 기초 모델을 크게 능가함을 입증합니다. 코드와 사전 학습된 모델은 https://github.com/LeapLabTHU/CheXWorld에서 확인할 수 있습니다.

English

Humans can develop internal world models that encode common sense knowledge, telling them how the world works and predicting the consequences of their actions. This concept has emerged as a promising direction for establishing general-purpose machine-learning models in recent preliminary works, e.g., for visual representation learning. In this paper, we present CheXWorld, the first effort towards a self-supervised world model for radiographic images. Specifically, our work develops a unified framework that simultaneously models three aspects of medical knowledge essential for qualified radiologists, including 1) local anatomical structures describing the fine-grained characteristics of local tissues (e.g., architectures, shapes, and textures); 2) global anatomical layouts describing the global organization of the human body (e.g., layouts of organs and skeletons); and 3) domain variations that encourage CheXWorld to model the transitions across different appearance domains of radiographs (e.g., varying clarity, contrast, and exposure caused by collecting radiographs from different hospitals, devices, or patients). Empirically, we design tailored qualitative and quantitative analyses, revealing that CheXWorld successfully captures these three dimensions of medical knowledge. Furthermore, transfer learning experiments across eight medical image classification and segmentation benchmarks showcase that CheXWorld significantly outperforms existing SSL methods and large-scale medical foundation models. Code & pre-trained models are available at https://github.com/LeapLabTHU/CheXWorld.

CheXWorld: 방사선 영상 표현을 위한 이미지 월드 모델링 탐구 학습

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

초록

Support