脑电基础模型:进展、基准测试与开放性问题
EEG Foundation Models: Progresses, Benchmarking, and Open Problems
January 25, 2026
作者: Dingkun Liu, Yuheng Chen, Zhu Chen, Zhenyao Cui, Yaozhi Wen, Jiayu An, Jingwei Luo, Dongrui Wu
cs.AI
摘要
脑电图(EEG)基础模型近期作为脑机接口(BCI)领域的重要范式崭露头角,其目标是从大规模异构记录中学习可迁移的神经表征。尽管发展迅速,但由于预训练目标、预处理方法及下游评估协议的不一致,现有EEG基础模型尚缺乏公平全面的比较。本文旨在填补这一空白。我们首先回顾了50个代表性模型,将其设计选择归纳为统一分类框架,涵盖数据标准化、模型架构和自监督预训练策略。随后在涵盖九类BCI范式的13个EEG数据集上,对12个开源基础模型及具有竞争力的专业基线模型进行了系统评估。着眼于实际部署需求,我们同时考察了留一被试协议下的跨被试泛化能力,以及被试内少样本场景下的快速校准性能。通过对比全参数微调与线性探测,我们评估了预训练表征的可迁移性,并探究了模型规模与下游性能的关系。研究结果表明:1)线性探测往往效果有限;2)从头训练的专业模型在多类任务中仍具竞争力;3)在当前数据规模与训练范式下,扩大基础模型规模未必能提升泛化性能。
English
Electroencephalography (EEG) foundation models have recently emerged as a promising paradigm for brain-computer interfaces (BCIs), aiming to learn transferable neural representations from large-scale heterogeneous recordings. Despite rapid progresses, there lacks fair and comprehensive comparisons of existing EEG foundation models, due to inconsistent pre-training objectives, preprocessing choices, and downstream evaluation protocols. This paper fills this gap. We first review 50 representative models and organize their design choices into a unified taxonomic framework including data standardization, model architectures, and self-supervised pre-training strategies. We then evaluate 12 open-source foundation models and competitive specialist baselines across 13 EEG datasets spanning nine BCI paradigms. Emphasizing real-world deployments, we consider both cross-subject generalization under a leave-one-subject-out protocol and rapid calibration under a within-subject few-shot setting. We further compare full-parameter fine-tuning with linear probing to assess the transferability of pre-trained representations, and examine the relationship between model scale and downstream performance. Our results indicate that: 1) linear probing is frequently insufficient; 2) specialist models trained from scratch remain competitive across many tasks; and, 3) larger foundation models do not necessarily yield better generalization performance under current data regimes and training practices.