重建心智之眼:利用對比學習和擴散先驗的 fMRI 到影像
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
May 29, 2023
作者: Paul S. Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman, Tanishq Mathew Abraham
cs.AI
摘要
我們提出了MindEye,一種新穎的fMRI-to-image方法,用於從大腦活動中檢索和重建查看的影像。我們的模型包括兩個平行子模塊,專門用於檢索(使用對比學習)和重建(使用擴散先驗)。MindEye可以將fMRI大腦活動映射到任何高維多模態潛在空間,如CLIP影像空間,從而可以使用接受來自該潛在空間的嵌入的生成模型進行影像重建。我們通過定性並排比較和定量評估,全面比較了我們的方法與其他現有方法,並展示了MindEye在重建和檢索任務中實現了最先進的性能。特別是,MindEye可以從高度相似的候選影像中準確檢索出原始影像,這表明其大腦嵌入保留了細粒度的影像特定信息。這使我們能夠準確地從大規模數據庫(如LAION-5B)中檢索影像。我們通過消融實驗表明,MindEye相對於先前方法的性能改進來自於專門用於檢索和重建的子模塊、改進的訓練技術以及具有數量級更多參數的訓練模型。此外,我們展示了MindEye可以通過使用來自單獨自編碼器的輸出的img2img更好地保留重建中的低級影像特徵。所有代碼均可在GitHub上找到。
English
We present MindEye, a novel fMRI-to-image approach to retrieve and
reconstruct viewed images from brain activity. Our model comprises two parallel
submodules that are specialized for retrieval (using contrastive learning) and
reconstruction (using a diffusion prior). MindEye can map fMRI brain activity
to any high dimensional multimodal latent space, like CLIP image space,
enabling image reconstruction using generative models that accept embeddings
from this latent space. We comprehensively compare our approach with other
existing methods, using both qualitative side-by-side comparisons and
quantitative evaluations, and show that MindEye achieves state-of-the-art
performance in both reconstruction and retrieval tasks. In particular, MindEye
can retrieve the exact original image even among highly similar candidates
indicating that its brain embeddings retain fine-grained image-specific
information. This allows us to accurately retrieve images even from large-scale
databases like LAION-5B. We demonstrate through ablations that MindEye's
performance improvements over previous methods result from specialized
submodules for retrieval and reconstruction, improved training techniques, and
training models with orders of magnitude more parameters. Furthermore, we show
that MindEye can better preserve low-level image features in the
reconstructions by using img2img, with outputs from a separate autoencoder. All
code is available on GitHub.