인간 고차 시각 피질의 인-컨텍스트 트랜스포머 모델을 위한 메타러닝

초록

고차 시각 피질 내의 기능적 표현을 이해하는 것은 계산 신경과학의 근본적인 질문 중 하나입니다. 대규모 데이터셋으로 사전 학습된 인공 신경망이 인간의 신경 반응과 놀라운 표현 정렬을 보여주지만, 시각 피질의 이미지 계산 가능 모델을 학습하기 위해서는 개인 수준의 대규모 fMRI 데이터셋이 필요합니다. 비용이 많이 들고 시간이 소요되며 종종 비현실적인 데이터 획득의 필요성은 새로운 피험자와 자극에 대한 인코더의 일반화 가능성을 제한합니다. BraInCoRL은 컨텍스트 내 학습을 사용하여 새로운 피험자와 자극에 대한 추가 미세 조정 없이도 소수의 예제로부터 복셀 단위의 신경 반응을 예측합니다. 우리는 다양한 수준의 컨텍스트 내 이미지 자극에 유연하게 조건을 부여할 수 있는 트랜스포머 아키텍처를 활용하여 여러 피험자에 걸쳐 귀납적 편향을 학습합니다. 학습 과정에서 우리는 모델이 컨텍스트 내 학습에 최적화되도록 명시적으로 최적화합니다. 이미지 특징과 복셀 활성화를 함께 조건으로 부여함으로써, 우리의 모델은 고차 시각 피질의 더 나은 성능을 보이는 복셀 단위 모델을 직접 생성하는 방법을 학습합니다. 우리는 BraInCoRL이 완전히 새로운 이미지에 대해 평가할 때 낮은 데이터 체제에서 기존의 복셀 단위 인코더 설계를 일관되게 능가하며, 강력한 테스트 시간 스케일링 행동을 보임을 입증합니다. 또한 이 모델은 다른 피험자와 fMRI 데이터 획득 매개변수를 사용하는 완전히 새로운 시각 fMRI 데이터셋으로도 일반화됩니다. 더 나아가, BraInCoRL은 의미론적으로 관련된 자극에 주의를 기울임으로써 고차 시각 피질의 신경 신호에 대한 더 나은 해석 가능성을 제공합니다. 마지막으로, 우리의 프레임워크가 자연어 질의에서 복셀 선택성으로의 해석 가능한 매핑을 가능하게 함을 보여줍니다.

English

Understanding functional representations within higher visual cortex is a fundamental question in computational neuroscience. While artificial neural networks pretrained on large-scale datasets exhibit striking representational alignment with human neural responses, learning image-computable models of visual cortex relies on individual-level, large-scale fMRI datasets. The necessity for expensive, time-intensive, and often impractical data acquisition limits the generalizability of encoders to new subjects and stimuli. BraInCoRL uses in-context learning to predict voxelwise neural responses from few-shot examples without any additional finetuning for novel subjects and stimuli. We leverage a transformer architecture that can flexibly condition on a variable number of in-context image stimuli, learning an inductive bias over multiple subjects. During training, we explicitly optimize the model for in-context learning. By jointly conditioning on image features and voxel activations, our model learns to directly generate better performing voxelwise models of higher visual cortex. We demonstrate that BraInCoRL consistently outperforms existing voxelwise encoder designs in a low-data regime when evaluated on entirely novel images, while also exhibiting strong test-time scaling behavior. The model also generalizes to an entirely new visual fMRI dataset, which uses different subjects and fMRI data acquisition parameters. Further, BraInCoRL facilitates better interpretability of neural signals in higher visual cortex by attending to semantically relevant stimuli. Finally, we show that our framework enables interpretable mappings from natural language queries to voxel selectivity.

인간 고차 시각 피질의 인-컨텍스트 트랜스포머 모델을 위한 메타러닝

Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex

초록

Support