重建大脑的“心灵之眼”：利用对比学习和扩散先验的 fMRI 到图像转换

摘要

我们提出了MindEye，一种新颖的fMRI到图像的方法，用于从大脑活动中检索和重建查看的图像。我们的模型包括两个并行子模块，专门用于检索（使用对比学习）和重建（使用扩散先验）。MindEye可以将fMRI大脑活动映射到任何高维多模态潜空间，如CLIP图像空间，从而使用接受来自该潜空间的嵌入的生成模型进行图像重建。我们通过定性并排比较和定量评估全面比较我们的方法与其他现有方法，并展示MindEye在重建和检索任务中实现了最先进的性能。特别是，MindEye可以在高度相似的候选项中甚至检索到确切的原始图像，表明其大脑嵌入保留了细粒度的图像特定信息。这使我们能够准确地从大规模数据库（如LAION-5B）中检索图像。我们通过消融实验证明，MindEye相对于先前方法的性能改进源自专门用于检索和重建的子模块、改进的训练技术以及训练具有数量级更多参数的模型。此外，我们展示了MindEye可以通过使用来自单独自动编码器的输出的img2img更好地保留重建中的低级图像特征。所有代码均可在GitHub上获得。

English

We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach with other existing methods, using both qualitative side-by-side comparisons and quantitative evaluations, and show that MindEye achieves state-of-the-art performance in both reconstruction and retrieval tasks. In particular, MindEye can retrieve the exact original image even among highly similar candidates indicating that its brain embeddings retain fine-grained image-specific information. This allows us to accurately retrieve images even from large-scale databases like LAION-5B. We demonstrate through ablations that MindEye's performance improvements over previous methods result from specialized submodules for retrieval and reconstruction, improved training techniques, and training models with orders of magnitude more parameters. Furthermore, we show that MindEye can better preserve low-level image features in the reconstructions by using img2img, with outputs from a separate autoencoder. All code is available on GitHub.

重建大脑的“心灵之眼”：利用对比学习和扩散先验的 fMRI 到图像转换

Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

摘要

Support