fMRIにおける言語符号化モデルのスケーリング則

要旨

Transformerベースの一方向性言語モデルからの表現は、自然言語に対する脳の反応を予測するのに有効であることが知られています。しかし、言語モデルと脳を比較した研究のほとんどは、GPT-2または同程度のサイズの言語モデルを使用しています。本研究では、OPTやLLaMAファミリーなどの大規模なオープンソースモデルが、fMRIを用いて記録された脳の反応を予測するのに優れているかどうかを検証しました。他の文脈でのスケーリング結果と同様に、125Mから30Bパラメータのモデルにおいて、脳の予測性能がモデルサイズに対して対数線形的にスケールし、3名の被験者におけるテストセットとの相関で測定されるエンコーディング性能が約15％向上することがわかりました。fMRIトレーニングセットのサイズをスケールさせた場合も、同様の対数線形的な挙動が観察されました。また、HuBERT、WavLM、Whisperを使用した音響エンコーディングモデルのスケーリング特性を評価し、モデルサイズに伴う同様の改善が見られました。これらの大規模で高性能なエンコーディングモデルに対するノイズ上限分析では、楔前部や高次聴覚皮質などの脳領域において、理論上の最大値に近づいていることが示されました。これらの結果は、モデルとデータの両方のスケールを増加させることで、脳の言語処理を非常に効果的にモデル化し、科学的理解を深めるだけでなく、デコーディングなどの応用を可能にすることを示唆しています。

English

Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales log-linearly with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar log-linear behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.

fMRIにおける言語符号化モデルのスケーリング則

Scaling laws for language encoding models in fMRI

要旨

Support