BrainG3N: 제어 가능한 3D 뇌 MRI 생성을 위한 이중 목적 토크나이저

초록

3차원 뇌 MRI는 임상 신경학 및 신경종양학에서 핵심적인 역할을 하며, 생성 모델을 통해 과소 대표 코호트를 보강하고, 질병 궤적을 시뮬레이션하며, 프라이버시를 보호하는 데이터 공유를 지원할 수 있다. 잠재 확산은 영상 데이터를 모델링하기 위한 대표적인 해결책이었지만, 토크나이저에 두 가지 상충되는 요구를 부과한다: 인코더 임베딩은 하위 작업이 활용하는 임상 정보를 유지해야 하며, 디코더는 해부학적으로 정확한 볼륨을 재구성해야 한다. 기존의 재구성 중심 토크나이저는 첫 번째 요구를 희생하면서 두 번째 요구를 달성한다. 이를 해결하기 위해, 우리는 인코더와 디코더를 분리한 완전한 체적 마스크 오토인코더(MAE) 기반 토크나이저를 3D 뇌 MRI 잠재 확산을 위해 도입한다: 고정된 3D MAE 인코더는 임상적으로 유용한 임베딩을 생성하고, 전용 CNN 디코더는 해당 임베딩의 선형 투영으로부터 복셀을 재구성한다. 우리는 4개 모달리티, 10개 질병 범주, 200개 이상의 획득 사이트에 걸친 18개 공개 코호트의 35,309개 볼륨으로 인코더를 사전 훈련하고, 두 가지 설정에서 그 이중 유용성을 입증한다. 첫째, 23개 작업 선형 프로빙 벤치마크에서 인코더는 23개 작업 중 21개에서 최신 모델(예: BrainIAC, BrainSegFounder 및 MedicalNet)을 능가하거나 일치한다. 둘째, 이러한 임상적으로 유용한 임베딩에 대해 훈련된 조건부 확산 트랜스포머(DiT)는 6개 변수에 걸친 조건부 생성과 환자 특정 종단 예측을 모두 지원한다. 함께 이러한 결과는 하위 임상 작업과 제어 가능한 생성 모두에 적합한 단일 3D 뇌 MRI 임베딩 공간을 확립한다.

English

Three-dimensional (3D) brain MRI is central to clinical neurology and neuro-oncology, where generative models could augment under-represented cohorts, simulate disease trajectories, and support privacy-preserving data sharing. Latent diffusion has been the go-to solution for modeling imaging data, but it places two competing demands on the tokenizer: encoder embeddings must retain the clinical information that downstream tasks act on, and the decoder must reconstruct anatomically faithful volumes. Existing reconstruction-driven tokenizers achieve the second at the expense of the first. To address this, we introduce a fully volumetric masked-autoencoder (MAE) based tokenizer for 3D brain MRI latent diffusion, decoupling encoder and decoder: a frozen 3D MAE encoder produces clinically informative embeddings, while a dedicated CNN decoder reconstructs voxels from a linear projection of those embeddings. We pretrain the encoder on 35,309 volumes from 18 public cohorts spanning four modalities, ten disease categories, and 200+ acquisition sites, and demonstrate its dual utility in two settings. First, on a 23-task linear-probing benchmark, the encoder outperforms or matches SOTA models (i.e., BrainIAC, BrainSegFounder, and MedicalNet) on 21 of 23 tasks. Second, a conditional diffusion transformer (DiT) trained on these clinically informative embeddings supports both conditional generation across six variables and patient-specific longitudinal forecasting. Together these results establish a single 3D brain-MRI embedding space capable of both downstream clinical tasks and controllable generation.