Brain-IT:基於大腦互動轉換器的fMRI圖像重建技術
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
October 29, 2025
作者: Roman Beliy, Amit Zalcher, Jonathan Kogman, Navve Wasserman, Michal Irani
cs.AI
摘要
基於功能性磁振造影(fMRI)的腦部記錄重建人眼所見圖像,為窺探人類大腦提供了一種非侵入性視窗。儘管擴散模型近期取得進展,現有方法仍常缺乏對實際所見圖像的忠實還原能力。我們提出「Brain-IT」這一受大腦啟發的方法,通過大腦交互轉換器(BIT)實現功能相似腦體素簇之間的有效互動,從而解決此難題。這些功能簇為所有受試者所共有,可作為整合大腦內與跨大腦資訊的基礎單元。所有模型組件均被各簇群與受試者共享,使得僅需有限數據即可實現高效訓練。為指導圖像重建,BIT預測兩種互補的局部圖塊級圖像特徵:(i)高層語義特徵,引導擴散模型朝向正確的圖像語義內容;(ii)低層結構特徵,協助擴散過程以正確的圖像粗粒度佈局進行初始化。BIT的設計實現了從腦體素簇到局部圖像特徵的資訊直接流動。基於這些原理,我們的方法透過fMRI實現的圖像重建能忠實還原所見圖像,在視覺效果與標準客觀指標上均超越當前頂尖技術。更值得注意的是,僅需新受試者1小時的fMRI數據,我們便能達到與現有方法使用完整40小時記錄訓練相當的成果。
English
Reconstructing images seen by people from their fMRI brain recordings
provides a non-invasive window into the human brain. Despite recent progress
enabled by diffusion models, current methods often lack faithfulness to the
actual seen images. We present "Brain-IT", a brain-inspired approach that
addresses this challenge through a Brain Interaction Transformer (BIT),
allowing effective interactions between clusters of functionally-similar
brain-voxels. These functional-clusters are shared by all subjects, serving as
building blocks for integrating information both within and across brains. All
model components are shared by all clusters & subjects, allowing efficient
training with a limited amount of data. To guide the image reconstruction, BIT
predicts two complementary localized patch-level image features: (i)high-level
semantic features which steer the diffusion model toward the correct semantic
content of the image; and (ii)low-level structural features which help to
initialize the diffusion process with the correct coarse layout of the image.
BIT's design enables direct flow of information from brain-voxel clusters to
localized image features. Through these principles, our method achieves image
reconstructions from fMRI that faithfully reconstruct the seen images, and
surpass current SotA approaches both visually and by standard objective
metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve
results comparable to current methods trained on full 40-hour recordings.