메타 러닝 인-컨텍스트를 통한 훈련 없이 가능한 교차 대상 뇌 디코딩

초록

뇌 신호로부터의 시각 정보 해독은 컴퓨터 비전과 신경과학의 교차점에서 중요한 과제로, 신경 표현과 시각의 계산 모델을 연결하는 방법론이 필요합니다. 이 분야의 공통된 목표는 일반화 가능한 교차 대상 모델을 구현하는 것입니다. 이 목표를 달성하는 데 있어 주요 장애물은 개인 간 신경 표현의 상당한 변동성으로, 지금까지는 각 대상별로 맞춤형 모델을 훈련하거나 개별적으로 미세 조정해야 했습니다. 이러한 문제를 해결하기 위해 우리는 미세 조정 없이도 새로운 대상에게 일반화되는 fMRI 기반 의미론적 시각 해독을 위한 메타 최적화 접근법을 제안합니다. 새로운 개인의 소량의 이미지-뇌 활성화 예시만을 조건으로 삼아, 우리 모델은 해당 개인의 고유한 신경 인코딩 패턴을 빠르게 추론하여 강력하고 효율적인 시각 해독을 가능하게 합니다. 우리의 접근법은 새로운 대상의 인코딩 모델에 대한 콘텍스트 내 학습을 명시적으로 최적화하며, 계층적 추론을 통해 인코더를 역전파하여 해독을 수행합니다. 첫째, 여러 뇌 영역에 대해 다중 자극과 반응을 통해 콘텍스트를 구성함으로써 복셀별 시각 반응 인코더 매개변수를 추정합니다. 둘째, 다중 복셀에 걸쳐 인코더 매개변수와 반응 값으로 구성된 콘텍스트를 구축하여 집계적 기능 역전파를 수행합니다. 우리는 다양한 시각 백본에 대해 재훈련이나 미세 조정 없이도 강력한 교차 대상 및 교차 스캐너 일반화 성능을 입증합니다. 더욱이, 우리의 접근법은 해부학적 정렬이나 자극 중복을 요구하지 않습니다. 본 연구는 비침습적 뇌 해독을 위한 일반화 가능한 기초 모델로 나아가는 중요한 단계입니다.

English

Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. A field-wide goal is to achieve generalizable, cross-subject models. A major obstacle towards this goal is the substantial variability in neural representations across individuals, which has so far required training bespoke models or fine-tuning separately for each subject. To address this challenge, we introduce a meta-optimized approach for semantic visual decoding from fMRI that generalizes to novel subjects without any fine-tuning. By simply conditioning on a small set of image-brain activation examples from the new individual, our model rapidly infers their unique neural encoding patterns to facilitate robust and efficient visual decoding. Our approach is explicitly optimized for in-context learning of the new subject's encoding model and performs decoding by hierarchical inference, inverting the encoder. First, for multiple brain regions, we estimate the per-voxel visual response encoder parameters by constructing a context over multiple stimuli and responses. Second, we construct a context consisting of encoder parameters and response values over multiple voxels to perform aggregated functional inversion. We demonstrate strong cross-subject and cross-scanner generalization across diverse visual backbones without retraining or fine-tuning. Moreover, our approach requires neither anatomical alignment nor stimulus overlap. This work is a critical step towards a generalizable foundation model for non-invasive brain decoding.

메타 러닝 인-컨텍스트를 통한 훈련 없이 가능한 교차 대상 뇌 디코딩

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

초록

Support