ChatPaper.aiChatPaper

Whisper幻覺檢測與緩解:通過隱藏表示引導與稀疏自編碼器

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

June 5, 2026
作者: Georgii Aparin, Vadim Popov, Tasnima Sadekova, Assel Yermekova
cs.AI

摘要

Whisper作為廣泛採用的語音辨識模型,已知會產生幻覺現象——即針對非語音音訊生成與輸入完全無關的連貫轉錄內容。我們探討是否能透過Whisper的內部表徵來偵測並緩解此類幻覺。通過提取音訊編碼器激活值,我們評估了兩種表徵空間:原始Whisper激活值與稀疏自編碼器(SAE)潛在變數。研究發現,兩個空間皆編碼了線性可分的幻覺相關資訊,其判別能力集中於稀疏特徵子集,並隨編碼器層數加深而增強。我們提出兩種引導策略:激活空間引導與SAE潛在空間引導。在完整非語音測試集上,基於SAE的引導策略將Whisper small的幻覺率從72.63%降至14.11%,Whisper large-v3則從86.88%降至27.33%,同時對語音資料僅造成微小WER退化,效能已接近基於微調的方法。
English
Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper's internal representations. We extract audio encoder activations and evaluate two representation spaces: raw Whisper activations and Sparse AutoEncoder (SAE) latents. We show that both spaces encode linearly separable hallucination-related information, with discriminative power concentrated in a sparse feature subset and increasing toward deeper encoder layers. We propose two steering strategies: activation-space steering and SAE latent-space steering. SAE-based steering reduces hallucination rate from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3 on the full non-speech test set, with small WER degradation on speech data, approaching the performance of fine-tuning-based methods.