전사를 넘어서: ASR의 기계적 해석 가능성

초록

해석 가능성(interpretability) 방법론은 최근, 특히 대규모 언어 모델의 맥락에서 상당한 주목을 받고 있으며, 언어적 표현에 대한 통찰, 오류 탐지, 그리고 환각(hallucination) 및 반복(repetition)과 같은 모델 행동을 이해하는 데 기여하고 있습니다. 그러나 이러한 기술들은 자동 음성 인식(ASR) 분야에서는 아직 충분히 탐구되지 않고 있는데, 이는 ASR 시스템의 성능과 해석 가능성을 모두 발전시킬 수 있는 잠재력에도 불구하고 그러합니다. 본 연구에서는 로짓 렌즈(logit lens), 선형 탐사(linear probing), 활성화 패칭(activation patching)과 같은 기존의 해석 가능성 방법론을 적용하고 체계적으로 활용하여 ASR 시스템의 계층(layer) 간에 음향적 및 의미적 정보가 어떻게 진화하는지 조사합니다. 우리의 실험은 반복 환각을 유발하는 특정 인코더-디코더 상호작용과 음향 표현 깊이에 내재된 의미적 편향(semantic bias)을 포함하여 이전에는 알려지지 않았던 내부 동역학을 밝혀냈습니다. 이러한 통찰은 음성 인식에 해석 가능성 기술을 확장하고 적용하는 것의 이점을 보여주며, 모델의 투명성과 견고성을 개선하기 위한 미래 연구의 유망한 방향을 제시합니다.

English

Interpretability methods have recently gained significant attention, particularly in the context of large language models, enabling insights into linguistic representations, error detection, and model behaviors such as hallucinations and repetitions. However, these techniques remain underexplored in automatic speech recognition (ASR), despite their potential to advance both the performance and interpretability of ASR systems. In this work, we adapt and systematically apply established interpretability methods such as logit lens, linear probing, and activation patching, to examine how acoustic and semantic information evolves across layers in ASR systems. Our experiments reveal previously unknown internal dynamics, including specific encoder-decoder interactions responsible for repetition hallucinations and semantic biases encoded deep within acoustic representations. These insights demonstrate the benefits of extending and applying interpretability techniques to speech recognition, opening promising directions for future research on improving model transparency and robustness.

전사를 넘어서: ASR의 기계적 해석 가능성

Beyond Transcription: Mechanistic Interpretability in ASR

초록

Support