MindEye2: 공유 주체 모델을 통해 1시간의 데이터로 fMRI에서 이미지 생성 가능

초록

뇌 활동으로부터 시각적 지각을 재구성하는 기술은 크게 발전했지만, 이러한 방법의 실용적 유용성은 제한적이었습니다. 이는 각 피실험자마다 모델이 독립적으로 훈련되며, 고품질 결과를 얻기 위해 수십 시간의 비용이 많이 드는 fMRI 훈련 데이터가 필요하기 때문입니다. 본 연구는 단 1시간의 fMRI 훈련 데이터만으로도 고품질 재구성을 가능하게 합니다. 우리는 7명의 피실험자에 걸쳐 모델을 사전 훈련한 후, 새로운 피실험자의 최소한의 데이터로 미세 조정을 수행합니다. 우리의 새로운 기능적 정렬 절차는 모든 뇌 데이터를 공통 피실험자 잠재 공간으로 선형 매핑한 후, CLIP 이미지 공간으로의 공유 비선형 매핑을 수행합니다. 그런 다음 CLIP 공간에서 픽셀 공간으로의 매핑은 Stable Diffusion XL을 미세 조정하여 텍스트 대신 CLIP 잠재 변수를 입력으로 받도록 함으로써 이루어집니다. 이 접근 방식은 제한된 훈련 데이터로도 피실험자 간 일반화를 개선하며, 단일 피실험자 접근 방식과 비교하여 최첨단 이미지 검색 및 재구성 지표를 달성합니다. MindEye2는 MRI 시설을 단 한 번 방문하는 것만으로도 정확한 지각 재구성이 가능함을 보여줍니다. 모든 코드는 GitHub에서 확인할 수 있습니다.

English

Reconstructions of visual perception from brain activity have improved tremendously, but the practical utility of such methods has been limited. This is because such models are trained independently per subject where each subject requires dozens of hours of expensive fMRI training data to attain high-quality results. The present work showcases high-quality reconstructions using only 1 hour of fMRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. Our novel functional alignment procedure linearly maps all brain data to a shared-subject latent space, followed by a shared non-linear mapping to CLIP image space. We then map from CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP latents as inputs instead of text. This approach improves out-of-subject generalization with limited training data and also attains state-of-the-art image retrieval and reconstruction metrics compared to single-subject approaches. MindEye2 demonstrates how accurate reconstructions of perception are possible from a single visit to the MRI facility. All code is available on GitHub.

MindEye2: 공유 주체 모델을 통해 1시간의 데이터로 fMRI에서 이미지 생성 가능

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

초록

Support