ChatPaper.aiChatPaper

Pseudo2Real:自動語音識別中偽標籤校正的任務算術

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

October 9, 2025
作者: Yi-Cheng Lin, Yu-Hsuan Li Liang, Hsuan Su, Tzu-Quan Lin, Shang-Tse Chen, Yun-Nung Chen, Hung-yi Lee
cs.AI

摘要

在领域转移下实现鲁棒的自动语音识别(ASR)至关重要,因为现实世界中的系统会遭遇未见过的口音和领域,且标注数据有限。尽管伪标签提供了一种实用的解决方案,但它常常引入系统性的、特定于口音的误差,而这些误差通过过滤无法修正。我们提出疑问:在没有目标领域真实标签的情况下,如何纠正这些反复出现的偏差?我们提出了一种简单的参数空间校正方法:在一个包含真实标签和伪标签数据的源域中,两个ASR模型从相同的初始化状态进行微调,一个使用真实标签,另一个使用伪标签,它们的权重差异形成一个校正向量,该向量捕捉了伪标签的偏差。当将此向量应用于伪标签的目标模型时,它提升了识别效果,在AfriSpeech-200数据集上,使用Whisper tiny模型对十种非洲口音进行测试,实现了高达35%的相对词错误率(WER)降低。
English
Robust ASR under domain shift is crucial because real-world systems encounter unseen accents and domains with limited labeled data. Although pseudo-labeling offers a practical workaround, it often introduces systematic, accent-specific errors that filtering fails to fix. We ask: How can we correct these recurring biases without target ground truth? We propose a simple parameter-space correction: in a source domain containing both real and pseudo-labeled data, two ASR models are fine-tuned from the same initialization, one on ground-truth labels and the other on pseudo-labels, and their weight difference forms a correction vector that captures pseudo-label biases. When applied to a pseudo-labeled target model, this vector enhances recognition, achieving up to a 35% relative Word Error Rate (WER) reduction on AfriSpeech-200 across ten African accents with the Whisper tiny model.
PDF62October 13, 2025