從激活到因果：發現人類大腦中的因果視覺表徵

摘要

識別大腦中哪些腦區表徵特定視覺概念，是神經科學的一項核心挑戰。現有方法通過激活最大化來定位粗略的功能區域（如臉部、場景），找出相對於其他概念而言對目標概念反應較強的區域。然而，僅憑強烈激活並不足以證明該區域真正表徵該概念本身，因為反應可能源於相關的視覺或語義線索。我們提出 BrainCause 這個自動化框架，結合生成模型與大腦模型，合成受控刺激，並透過目標導向的因果測試驗證神經表徵。對於查詢指定的感興趣概念，框架會建構目標刺激集，包含概念圖像、移除目標概念但保留其他圖像內容的反事實編輯圖像，以及含有候選相關干擾項的圖像。接著，使用影像到 fMRI 編碼模型預測大腦反應，並搜尋對目標概念而非相關替代選項做出特定反應的表徵。BrainCause 回傳經驗證的候選表徵，並提出後續 fMRI 實驗以進一步測試或擴展其發現。該方法成功恢復了已知的功能定位，並在數十個概念中識別出新的候選表徵，這些結果在預測與實測的 fMRI 數據上均獲得驗證。關鍵在於，我們證明了若缺乏因果驗證，大部分定位結果會是假陽性，確認僅有激活不足以作為表徵的證據。

English

Identifying which brain regions represent a visual concept in the human brain is a central challenge in neuroscience. Existing approaches have localized coarse functional regions (e.g., faces, places) through activation maximization, identifying regions that activate strongly for a target concept relative to other concepts. Yet strong activation alone does not establish that a region represents the concept itself, as responses may instead be driven by correlated visual or semantic cues. We introduce BrainCause, an automated framework that combines generative and brain models to synthesize controlled stimuli and validate neural representations through targeted causal testing. Given a query specifying a concept of interest, our framework constructs targeted stimulus sets comprising concept images, counterfactual edits that remove the target concept while preserving other image content, and images with candidate correlated distractors. It then uses an image-to-fMRI encoding model to predict brain responses and searches for representations that respond specifically to the target concept over correlated alternatives. BrainCause returns validated candidate representations and proposes follow-up fMRI experiments to further test or extend its discoveries. Our approach successfully recovers known functional localizations and identifies new candidate representations across dozens of concepts, validated on both predicted and measured fMRI data. Critically, we show that without causal validation, a large fraction of localizations would be false positives, confirming that activation alone is insufficient evidence of representation.