Brain-IT:基于脑交互Transformer的fMRI图像重建
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
October 29, 2025
作者: Roman Beliy, Amit Zalcher, Jonathan Kogman, Navve Wasserman, Michal Irani
cs.AI
摘要
基于fMRI脑记录重建人眼所见图像,为研究人脑提供了非侵入式观察窗口。尽管扩散模型推动了该领域进展,但现有方法常缺乏对真实所见图像的忠实还原。我们提出"Brain-IT"这一仿脑方法,通过脑交互Transformer(BIT)实现功能相似脑体素簇间的有效交互,从而解决这一难题。这些功能簇为所有受试者所共有,可作为大脑内部及跨脑信息整合的基础模块。所有模型组件均被各簇群和受试者共享,使得有限数据下的高效训练成为可能。为引导图像重建,BIT预测两种互补的局部块级图像特征:(1)高层语义特征,引导扩散模型生成正确的图像语义内容;(2)低层结构特征,帮助扩散过程以正确的图像粗粒度布局初始化。BIT的设计实现了从脑体素簇到局部图像特征的直接信息流动。基于这些原理,我们的方法通过fMRI实现了对所见图像的忠实重建,在视觉表现和客观指标上均超越当前最优方法。此外,仅需新受试者1小时的fMRI数据,我们就能达到与现有方法使用40小时完整数据训练相当的效果。
English
Reconstructing images seen by people from their fMRI brain recordings
provides a non-invasive window into the human brain. Despite recent progress
enabled by diffusion models, current methods often lack faithfulness to the
actual seen images. We present "Brain-IT", a brain-inspired approach that
addresses this challenge through a Brain Interaction Transformer (BIT),
allowing effective interactions between clusters of functionally-similar
brain-voxels. These functional-clusters are shared by all subjects, serving as
building blocks for integrating information both within and across brains. All
model components are shared by all clusters & subjects, allowing efficient
training with a limited amount of data. To guide the image reconstruction, BIT
predicts two complementary localized patch-level image features: (i)high-level
semantic features which steer the diffusion model toward the correct semantic
content of the image; and (ii)low-level structural features which help to
initialize the diffusion process with the correct coarse layout of the image.
BIT's design enables direct flow of information from brain-voxel clusters to
localized image features. Through these principles, our method achieves image
reconstructions from fMRI that faithfully reconstruct the seen images, and
surpass current SotA approaches both visually and by standard objective
metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve
results comparable to current methods trained on full 40-hour recordings.