ChatPaper.aiChatPaper

BrainG3N:一种用于可控3D脑部MRI生成的双重用途分词器

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

June 17, 2026
作者: Max Van Puyvelde, Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert
cs.AI

摘要

三维脑部MRI是临床神经学和神经肿瘤学的核心,其中生成模型可增强代表性不足的队列、模拟疾病轨迹并支持隐私保护数据共享。潜在扩散已成为建模影像数据的首选方案,但它对分词器提出了两个相互竞争的要求:编码器嵌入必须保留下游任务所依赖的临床信息,而解码器必须重建解剖学上保真的体素。现有基于重建驱动的分词器实现了后者,却以前者为代价。为解决这一问题,我们引入了一种基于全体积掩码自编码器(MAE)的分词器,用于三维脑部MRI潜在扩散,将编码器与解码器解耦:冻结的三维MAE编码器产生具有临床信息的嵌入,而专用的CNN解码器则通过这些嵌入的线性投影重建体素。我们在来自18个公开队列的35,309个体素上预训练该编码器,涵盖四种模态、十种疾病类别及200多个采集站点,并在两种设置中展示了其双重效用。首先,在23项任务的线性探测基准测试中,该编码器在21项任务上优于或匹配最先进模型(即BrainIAC、BrainSegFounder和MedicalNet)。其次,基于这些临床信息嵌入训练的条件扩散变换器(DiT)既支持跨六个变量的条件生成,也支持患者特异性纵向预测。这些结果共同建立了一个单一的三维脑部MRI嵌入空间,既能用于下游临床任务,也能实现可控生成。
English
Three-dimensional (3D) brain MRI is central to clinical neurology and neuro-oncology, where generative models could augment under-represented cohorts, simulate disease trajectories, and support privacy-preserving data sharing. Latent diffusion has been the go-to solution for modeling imaging data, but it places two competing demands on the tokenizer: encoder embeddings must retain the clinical information that downstream tasks act on, and the decoder must reconstruct anatomically faithful volumes. Existing reconstruction-driven tokenizers achieve the second at the expense of the first. To address this, we introduce a fully volumetric masked-autoencoder (MAE) based tokenizer for 3D brain MRI latent diffusion, decoupling encoder and decoder: a frozen 3D MAE encoder produces clinically informative embeddings, while a dedicated CNN decoder reconstructs voxels from a linear projection of those embeddings. We pretrain the encoder on 35,309 volumes from 18 public cohorts spanning four modalities, ten disease categories, and 200+ acquisition sites, and demonstrate its dual utility in two settings. First, on a 23-task linear-probing benchmark, the encoder outperforms or matches SOTA models (i.e., BrainIAC, BrainSegFounder, and MedicalNet) on 21 of 23 tasks. Second, a conditional diffusion transformer (DiT) trained on these clinically informative embeddings supports both conditional generation across six variables and patient-specific longitudinal forecasting. Together these results establish a single 3D brain-MRI embedding space capable of both downstream clinical tasks and controllable generation.