文字起こしを超えて：ASRにおけるメカニズム的解釈可能性

要旨

解釈可能性手法は近年、特に大規模言語モデルの文脈において注目を集めており、言語表現の洞察、エラー検出、幻覚や繰り返しといったモデルの挙動の理解を可能にしている。しかし、これらの技術は自動音声認識（ASR）においては未だ十分に探求されておらず、ASRシステムの性能と解釈可能性の両方を向上させる可能性を秘めている。本研究では、ロジットレンズ、線形プロービング、アクティベーションパッチングといった確立された解釈可能性手法を適応し、体系的に適用することで、ASRシステムにおける音響情報と意味情報が層を跨いでどのように進化するかを検証する。実験を通じて、繰り返し幻覚を引き起こす特定のエンコーダ-デコーダ間の相互作用や、音響表現の深層にエンコードされた意味的バイアスなど、これまで知られていなかった内部ダイナミクスを明らかにした。これらの知見は、音声認識に解釈可能性手法を拡張し適用することの利点を示しており、モデルの透明性と堅牢性を向上させるための今後の研究において有望な方向性を開くものである。

English

Interpretability methods have recently gained significant attention, particularly in the context of large language models, enabling insights into linguistic representations, error detection, and model behaviors such as hallucinations and repetitions. However, these techniques remain underexplored in automatic speech recognition (ASR), despite their potential to advance both the performance and interpretability of ASR systems. In this work, we adapt and systematically apply established interpretability methods such as logit lens, linear probing, and activation patching, to examine how acoustic and semantic information evolves across layers in ASR systems. Our experiments reveal previously unknown internal dynamics, including specific encoder-decoder interactions responsible for repetition hallucinations and semantic biases encoded deep within acoustic representations. These insights demonstrate the benefits of extending and applying interpretability techniques to speech recognition, opening promising directions for future research on improving model transparency and robustness.

文字起こしを超えて：ASRにおけるメカニズム的解釈可能性

Beyond Transcription: Mechanistic Interpretability in ASR

要旨

Support