語言編碼模型在功能性磁共振成像中的規模定律

摘要

基於變壓器的單向語言模型所產生的表示被認為在預測大腦對自然語言的反應方面非常有效。然而，大多數比較語言模型和大腦的研究都使用了GPT-2或類似規模的語言模型。在這裡，我們測試了來自OPT和LLaMA家族等更大型的開源模型是否更能準確地預測使用fMRI記錄的大腦反應。與其他情境中的規模化結果相呼應，我們發現從125M到30B參數模型，大腦預測性能以對數線性方式隨著模型大小增加，通過與保留測試集的相關性測量，跨3個受試者，編碼性能提高了約15%。在調整fMRI訓練集大小時觀察到類似的對數線性行為。我們還對使用HuBERT、WavLM和Whisper的聲學編碼模型進行了規模化特徵化，並發現模型大小增加時有類似的改善。對這些大型、高性能編碼模型的噪聲天花板分析顯示，對於如前準備皮質和更高層聽覺皮質等大腦區域，性能接近理論最大值。這些結果表明，在模型和數據的規模化方面取得進展將產生非常有效的大腦語言處理模型，從而實現更好的科學理解以及解碼等應用。

English

Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales log-linearly with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar log-linear behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.

語言編碼模型在功能性磁共振成像中的規模定律

Scaling laws for language encoding models in fMRI

摘要

Support