AudioSAE: スパースオートエンコーダによる音声処理モデルの解明に向けて

要旨

スパースオートエンコーダ（SAE）は神経表現の解釈における強力なツールであるが、音声分野での応用は未開拓である。本研究では、WhisperとHuBERTの全エンコーダ層にわたってSAEを学習させ、その安定性と解釈可能性について詳細な評価を行うとともに、実用性を実証する。ランダムシードを変更しても50%以上の特徴量が一貫して保持され、再構成品質も維持された。SAEの特徴量は一般的な音響・意味情報に加え、環境音やパラ言語音（笑い声、ささやき声等）といった特定事象を捉え、それらを効果的に分離可能である。概念を消去するために必要な特徴量削除は19-27%に留まった。特徴量操縦により、Whisperの誤った音声検出を70%削減し、WERの悪化は無視できる範囲に抑え、実世界での適用可能性を示した。さらに、SAE特徴量が音声知覚中のヒトの脳波活動と相関することを発見し、人間の神経処理との整合性を示唆する。コードとチェックポイントはhttps://github.com/audiosae/audiosae_demoで公開している。

English

Sparse Autoencoders (SAEs) are powerful tools for interpreting neural representations, yet their use in audio remains underexplored. We train SAEs across all encoder layers of Whisper and HuBERT, provide an extensive evaluation of their stability, interpretability, and show their practical utility. Over 50% of the features remain consistent across random seeds, and reconstruction quality is preserved. SAE features capture general acoustic and semantic information as well as specific events, including environmental noises and paralinguistic sounds (e.g. laughter, whispering) and disentangle them effectively, requiring removal of only 19-27% of features to erase a concept. Feature steering reduces Whisper's false speech detections by 70% with negligible WER increase, demonstrating real-world applicability. Finally, we find SAE features correlated with human EEG activity during speech perception, indicating alignment with human neural processing. The code and checkpoints are available at https://github.com/audiosae/audiosae_demo.

AudioSAE: スパースオートエンコーダによる音声処理モデルの解明に向けて

AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders

要旨

Support