MEGConformer:基于Conformer的脑磁图解码器实现鲁棒性语音及音素分类
MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification
December 1, 2025
作者: Xabier de Zuazo, Ibon Saratxaga, Eva Navas
cs.AI
摘要
我们为2025年LibriBrain PNPL竞赛提出基于Conformer架构的解码器,针对两项基础性MEG任务:语音检测与音素分类。该方法将紧凑型Conformer适配于306通道原始MEG信号,采用轻量级卷积投影层和任务专用头部。在语音检测任务中,面向MEG的SpecAugment技术首次探索了MEG特异性数据增强策略。音素分类任务则采用逆平方根类别加权和动态分组加载器来处理100样本平均化的示例。此外,简单的实例级归一化对缓解留出集上的分布偏移起到关键作用。使用官方标准赛道划分和F1-macro指标进行模型选择后,我们的最佳系统在排行榜上分别达到88.9%(语音检测)和65.8%(音素分类)的准确率,超越竞赛基线并在两项任务中均位列前十。具体实现细节、技术文档、源代码及模型检查点详见https://github.com/neural2speech/libribrain-experiments。
English
We present Conformer-based decoders for the LibriBrain 2025 PNPL competition, targeting two foundational MEG tasks: Speech Detection and Phoneme Classification. Our approach adapts a compact Conformer to raw 306-channel MEG signals, with a lightweight convolutional projection layer and task-specific heads. For Speech Detection, a MEG-oriented SpecAugment provided a first exploration of MEG-specific augmentation. For Phoneme Classification, we used inverse-square-root class weighting and a dynamic grouping loader to handle 100-sample averaged examples. In addition, a simple instance-level normalization proved critical to mitigate distribution shifts on the holdout split. Using the official Standard track splits and F1-macro for model selection, our best systems achieved 88.9% (Speech) and 65.8% (Phoneme) on the leaderboard, surpassing the competition baselines and ranking within the top-10 in both tasks. For further implementation details, the technical documentation, source code, and checkpoints are available at https://github.com/neural2speech/libribrain-experiments.