MindEye2:共享主題模型使得使用1小時的數據進行fMRI到圖像的轉換成為可能。
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
March 17, 2024
作者: Paul S. Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A. Norman, Tanishq Mathew Abraham
cs.AI
摘要
從大腦活動重建視覺知覺已經有了巨大的進步,但是這些方法的實際應用價值卻受到了限制。這是因為這些模型是為每個受試者獨立訓練的,每個受試者需要數十小時昂貴的 fMRI 訓練數據才能獲得高質量的結果。本研究展示了僅使用 1 小時的 fMRI 訓練數據即可實現高質量的重建。我們在 7 名受試者間預先訓練我們的模型,然後在新受試者的極少數數據上進行微調。我們的新穎功能對齊程序將所有腦部數據線性映射到共享受試者潛在空間,然後通過共享非線性映射到 CLIP 圖像空間。然後,我們通過微調 Stable Diffusion XL 以接受 CLIP 潛在作為輸入,將從 CLIP 空間映射到像素空間。這種方法改善了對於有限訓練數據的跨受試者泛化,並且與單受試者方法相比實現了最先進的圖像檢索和重建指標。MindEye2 展示了如何從一次造訪核磁共振成像設施即可實現準確的知覺重建。所有代碼都可以在 GitHub 上找到。
English
Reconstructions of visual perception from brain activity have improved
tremendously, but the practical utility of such methods has been limited. This
is because such models are trained independently per subject where each subject
requires dozens of hours of expensive fMRI training data to attain high-quality
results. The present work showcases high-quality reconstructions using only 1
hour of fMRI training data. We pretrain our model across 7 subjects and then
fine-tune on minimal data from a new subject. Our novel functional alignment
procedure linearly maps all brain data to a shared-subject latent space,
followed by a shared non-linear mapping to CLIP image space. We then map from
CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP
latents as inputs instead of text. This approach improves out-of-subject
generalization with limited training data and also attains state-of-the-art
image retrieval and reconstruction metrics compared to single-subject
approaches. MindEye2 demonstrates how accurate reconstructions of perception
are possible from a single visit to the MRI facility. All code is available on
GitHub.Summary
AI-Generated Summary