重建心智之眼：利用對比學習和擴散先驗的 fMRI 到影像

摘要

我們提出了MindEye，一種新穎的fMRI-to-image方法，用於從大腦活動中檢索和重建查看的影像。我們的模型包括兩個平行子模塊，專門用於檢索（使用對比學習）和重建（使用擴散先驗）。MindEye可以將fMRI大腦活動映射到任何高維多模態潛在空間，如CLIP影像空間，從而可以使用接受來自該潛在空間的嵌入的生成模型進行影像重建。我們通過定性並排比較和定量評估，全面比較了我們的方法與其他現有方法，並展示了MindEye在重建和檢索任務中實現了最先進的性能。特別是，MindEye可以從高度相似的候選影像中準確檢索出原始影像，這表明其大腦嵌入保留了細粒度的影像特定信息。這使我們能夠準確地從大規模數據庫（如LAION-5B）中檢索影像。我們通過消融實驗表明，MindEye相對於先前方法的性能改進來自於專門用於檢索和重建的子模塊、改進的訓練技術以及具有數量級更多參數的訓練模型。此外，我們展示了MindEye可以通過使用來自單獨自編碼器的輸出的img2img更好地保留重建中的低級影像特徵。所有代碼均可在GitHub上找到。

English

We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach with other existing methods, using both qualitative side-by-side comparisons and quantitative evaluations, and show that MindEye achieves state-of-the-art performance in both reconstruction and retrieval tasks. In particular, MindEye can retrieve the exact original image even among highly similar candidates indicating that its brain embeddings retain fine-grained image-specific information. This allows us to accurately retrieve images even from large-scale databases like LAION-5B. We demonstrate through ablations that MindEye's performance improvements over previous methods result from specialized submodules for retrieval and reconstruction, improved training techniques, and training models with orders of magnitude more parameters. Furthermore, we show that MindEye can better preserve low-level image features in the reconstructions by using img2img, with outputs from a separate autoencoder. All code is available on GitHub.

重建心智之眼：利用對比學習和擴散先驗的 fMRI 到影像

Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

摘要

Support