CheXWorld: レントゲン画像表現のための画像世界モデリングの探求

要旨

人間は、世界がどのように機能し、自身の行動の結果を予測するための常識的知識を符号化した内部世界モデルを構築することができる。この概念は、近年の予備的研究において、例えば視覚表現学習のための汎用機械学習モデルを確立する有望な方向性として浮上している。本論文では、放射線画像のための自己教師あり世界モデルであるCheXWorldを初めて提案する。具体的には、本論文では、資格を持つ放射線科医にとって不可欠な医学的知識の3つの側面を同時にモデル化する統一フレームワークを開発する。これには、1) 局所組織の微細な特徴（例：構造、形状、テクスチャ）を記述する局所解剖構造、2) 人体の全体的な組織（例：臓器や骨格の配置）を記述する全体的な解剖学的レイアウト、3) 異なる病院、装置、または患者から収集された放射線画像の外観領域間の遷移をモデル化するための領域変動が含まれる。実験的には、CheXWorldがこれらの3つの医学的知識の次元を的確に捉えていることを示すために、質的および量的な分析を設計した。さらに、8つの医療画像分類およびセグメンテーションベンチマークにおける転移学習実験により、CheXWorldが既存の自己教師あり学習（SSL）手法や大規模医療基盤モデルを大幅に上回る性能を示すことが明らかになった。コードおよび事前学習済みモデルはhttps://github.com/LeapLabTHU/CheXWorldで公開されている。

English

Humans can develop internal world models that encode common sense knowledge, telling them how the world works and predicting the consequences of their actions. This concept has emerged as a promising direction for establishing general-purpose machine-learning models in recent preliminary works, e.g., for visual representation learning. In this paper, we present CheXWorld, the first effort towards a self-supervised world model for radiographic images. Specifically, our work develops a unified framework that simultaneously models three aspects of medical knowledge essential for qualified radiologists, including 1) local anatomical structures describing the fine-grained characteristics of local tissues (e.g., architectures, shapes, and textures); 2) global anatomical layouts describing the global organization of the human body (e.g., layouts of organs and skeletons); and 3) domain variations that encourage CheXWorld to model the transitions across different appearance domains of radiographs (e.g., varying clarity, contrast, and exposure caused by collecting radiographs from different hospitals, devices, or patients). Empirically, we design tailored qualitative and quantitative analyses, revealing that CheXWorld successfully captures these three dimensions of medical knowledge. Furthermore, transfer learning experiments across eight medical image classification and segmentation benchmarks showcase that CheXWorld significantly outperforms existing SSL methods and large-scale medical foundation models. Code & pre-trained models are available at https://github.com/LeapLabTHU/CheXWorld.

CheXWorld: レントゲン画像表現のための画像世界モデリングの探求

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

要旨

Support