从激活到因果：人脑中因果视觉表征的发现

摘要

识别人类大脑中哪些区域表征视觉概念是神经科学的核心挑战。现有方法通过激活最大化定位粗略的功能区域（例如面孔、场所），即识别那些对目标概念相对于其他概念产生更强激活的区域。然而，仅凭强烈激活并不能证明该区域本身表征了该概念，因为反应也可能由相关的视觉或语义线索驱动。我们提出BrainCause——一种结合生成模型与脑模型的自动化框架，通过合成受控刺激并实施靶向因果测试来验证神经表征。给定指定感兴趣概念的查询，该框架构建靶向刺激集，包括概念图像、在保留其他图像内容的同时移除目标概念的反事实编辑图像，以及包含候选相关干扰物的图像。随后利用图像到功能磁共振成像编码模型预测脑反应，并寻找对目标概念反应特异性高于相关替代概念的脑区表征。BrainCause返回经过验证的候选表征，并提出后续功能磁共振成像实验以进一步检验或扩展其发现。我们的方法成功恢复了已知的功能定位，并在数十个概念中发现了新的候选表征，通过预测和实测的功能磁共振成像数据验证。关键的是，我们证明若缺乏因果验证，大部分定位结果实为假阳性，确认仅凭激活不足以作为表征的证据。

English

Identifying which brain regions represent a visual concept in the human brain is a central challenge in neuroscience. Existing approaches have localized coarse functional regions (e.g., faces, places) through activation maximization, identifying regions that activate strongly for a target concept relative to other concepts. Yet strong activation alone does not establish that a region represents the concept itself, as responses may instead be driven by correlated visual or semantic cues. We introduce BrainCause, an automated framework that combines generative and brain models to synthesize controlled stimuli and validate neural representations through targeted causal testing. Given a query specifying a concept of interest, our framework constructs targeted stimulus sets comprising concept images, counterfactual edits that remove the target concept while preserving other image content, and images with candidate correlated distractors. It then uses an image-to-fMRI encoding model to predict brain responses and searches for representations that respond specifically to the target concept over correlated alternatives. BrainCause returns validated candidate representations and proposes follow-up fMRI experiments to further test or extend its discoveries. Our approach successfully recovers known functional localizations and identifies new candidate representations across dozens of concepts, validated on both predicted and measured fMRI data. Critically, we show that without causal validation, a large fraction of localizations would be false positives, confirming that activation alone is insufficient evidence of representation.