AudioSAE：基於稀疏自動編碼器的音訊處理模型解析框架

摘要

稀疏自編碼器（SAE）是詮釋神經表徵的強大工具，但其在音頻領域的應用仍待探索。我們在Whisper和HuBERT的所有編碼器層上訓練SAE，對其穩定性與可解釋性進行全面評估，並展示其實用價值。超過50%的特徵在不同隨機種子下保持一致性，且重建品質得以維持。SAE特徵不僅能捕捉一般聲學與語義資訊，還能識別特定事件（如環境噪音和副語言聲音：笑聲、耳語等），並有效解耦這些特徵——僅需移除19-27%的特徵即可消除特定概念。透過特徵引導技術，Whisper的虛假語音檢測錯誤率降低70%，且字錯誤率（WER）僅有可忽略的增長，展現其實際應用潛力。最後，我們發現SAE特徵與人類在語音感知過程中的腦電圖（EEG）活動存在相關性，暗示其與人類神經處理機制具有對應關係。程式碼與訓練節點已公開於：https://github.com/audiosae/audiosae_demo。

English

Sparse Autoencoders (SAEs) are powerful tools for interpreting neural representations, yet their use in audio remains underexplored. We train SAEs across all encoder layers of Whisper and HuBERT, provide an extensive evaluation of their stability, interpretability, and show their practical utility. Over 50% of the features remain consistent across random seeds, and reconstruction quality is preserved. SAE features capture general acoustic and semantic information as well as specific events, including environmental noises and paralinguistic sounds (e.g. laughter, whispering) and disentangle them effectively, requiring removal of only 19-27% of features to erase a concept. Feature steering reduces Whisper's false speech detections by 70% with negligible WER increase, demonstrating real-world applicability. Finally, we find SAE features correlated with human EEG activity during speech perception, indicating alignment with human neural processing. The code and checkpoints are available at https://github.com/audiosae/audiosae_demo.

AudioSAE：基於稀疏自動編碼器的音訊處理模型解析框架

AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders

摘要

Support