BrainG3N：一种用于可控3D脑部MRI生成的双重用途分词器

摘要

三维脑部MRI是临床神经学和神经肿瘤学的核心，其中生成模型可增强代表性不足的队列、模拟疾病轨迹并支持隐私保护数据共享。潜在扩散已成为建模影像数据的首选方案，但它对分词器提出了两个相互竞争的要求：编码器嵌入必须保留下游任务所依赖的临床信息，而解码器必须重建解剖学上保真的体素。现有基于重建驱动的分词器实现了后者，却以前者为代价。为解决这一问题，我们引入了一种基于全体积掩码自编码器（MAE）的分词器，用于三维脑部MRI潜在扩散，将编码器与解码器解耦：冻结的三维MAE编码器产生具有临床信息的嵌入，而专用的CNN解码器则通过这些嵌入的线性投影重建体素。我们在来自18个公开队列的35,309个体素上预训练该编码器，涵盖四种模态、十种疾病类别及200多个采集站点，并在两种设置中展示了其双重效用。首先，在23项任务的线性探测基准测试中，该编码器在21项任务上优于或匹配最先进模型（即BrainIAC、BrainSegFounder和MedicalNet）。其次，基于这些临床信息嵌入训练的条件扩散变换器（DiT）既支持跨六个变量的条件生成，也支持患者特异性纵向预测。这些结果共同建立了一个单一的三维脑部MRI嵌入空间，既能用于下游临床任务，也能实现可控生成。

English

Three-dimensional (3D) brain MRI is central to clinical neurology and neuro-oncology, where generative models could augment under-represented cohorts, simulate disease trajectories, and support privacy-preserving data sharing. Latent diffusion has been the go-to solution for modeling imaging data, but it places two competing demands on the tokenizer: encoder embeddings must retain the clinical information that downstream tasks act on, and the decoder must reconstruct anatomically faithful volumes. Existing reconstruction-driven tokenizers achieve the second at the expense of the first. To address this, we introduce a fully volumetric masked-autoencoder (MAE) based tokenizer for 3D brain MRI latent diffusion, decoupling encoder and decoder: a frozen 3D MAE encoder produces clinically informative embeddings, while a dedicated CNN decoder reconstructs voxels from a linear projection of those embeddings. We pretrain the encoder on 35,309 volumes from 18 public cohorts spanning four modalities, ten disease categories, and 200+ acquisition sites, and demonstrate its dual utility in two settings. First, on a 23-task linear-probing benchmark, the encoder outperforms or matches SOTA models (i.e., BrainIAC, BrainSegFounder, and MedicalNet) on 21 of 23 tasks. Second, a conditional diffusion transformer (DiT) trained on these clinically informative embeddings supports both conditional generation across six variables and patient-specific longitudinal forecasting. Together these results establish a single 3D brain-MRI embedding space capable of both downstream clinical tasks and controllable generation.