超越转录:ASR中的机制可解释性
Beyond Transcription: Mechanistic Interpretability in ASR
August 21, 2025
作者: Neta Glazer, Yael Segal-Feldman, Hilit Segev, Aviv Shamsian, Asaf Buchnick, Gill Hetz, Ethan Fetaya, Joseph Keshet, Aviv Navon
cs.AI
摘要
近年来,可解释性方法在大型语言模型领域获得了广泛关注,特别是在揭示语言表征、错误检测以及模型行为(如幻觉和重复)方面展现出独特价值。然而,这些技术在自动语音识别(ASR)中的应用仍显不足,尽管它们有潜力提升ASR系统的性能和可解释性。本研究通过调整并系统性地应用诸如Logit Lens、线性探测和激活修补等成熟的可解释性方法,深入探究ASR系统中声学与语义信息在各层间的演变过程。实验揭示了此前未知的内部动态,包括导致重复幻觉的特定编码器-解码器交互,以及深植于声学表征中的语义偏差。这些发现证明了将可解释性技术扩展并应用于语音识别的益处,为未来提升模型透明度和鲁棒性的研究开辟了富有前景的方向。
English
Interpretability methods have recently gained significant attention,
particularly in the context of large language models, enabling insights into
linguistic representations, error detection, and model behaviors such as
hallucinations and repetitions. However, these techniques remain underexplored
in automatic speech recognition (ASR), despite their potential to advance both
the performance and interpretability of ASR systems. In this work, we adapt and
systematically apply established interpretability methods such as logit lens,
linear probing, and activation patching, to examine how acoustic and semantic
information evolves across layers in ASR systems. Our experiments reveal
previously unknown internal dynamics, including specific encoder-decoder
interactions responsible for repetition hallucinations and semantic biases
encoded deep within acoustic representations. These insights demonstrate the
benefits of extending and applying interpretability techniques to speech
recognition, opening promising directions for future research on improving
model transparency and robustness.