Pseudo2Real: 自動音声認識における疑似ラベル補正のためのタスク演算

要旨

ドメインシフト下でのロバストな自動音声認識（ASR）は、実世界のシステムが未知のアクセントやドメインに遭遇し、ラベル付きデータが限られている状況において極めて重要である。擬似ラベリングは実用的な回避策を提供するが、しばしば系統的でアクセント固有の誤りを導入し、フィルタリングでは修正できない。我々は問う：ターゲットの正解データなしに、これらの繰り返し発生するバイアスをどのように修正できるか？我々は、単純なパラメータ空間補正を提案する。具体的には、実データと擬似ラベルデータの両方を含むソースドメインにおいて、同じ初期化から2つのASRモデルを微調整し、一方を正解ラベルで、もう一方を擬似ラベルで学習させる。そして、それらの重みの差を取ることで、擬似ラベルのバイアスを捉えた補正ベクトルを形成する。このベクトルを擬似ラベル付きターゲットモデルに適用すると、認識性能が向上し、Whisper tinyモデルを用いたAfriSpeech-200の10のアフリカアクセントにおいて、最大35%の相対的な単語誤り率（WER）の低減を達成した。

English

Robust ASR under domain shift is crucial because real-world systems encounter unseen accents and domains with limited labeled data. Although pseudo-labeling offers a practical workaround, it often introduces systematic, accent-specific errors that filtering fails to fix. We ask: How can we correct these recurring biases without target ground truth? We propose a simple parameter-space correction: in a source domain containing both real and pseudo-labeled data, two ASR models are fine-tuned from the same initialization, one on ground-truth labels and the other on pseudo-labels, and their weight difference forms a correction vector that captures pseudo-label biases. When applied to a pseudo-labeled target model, this vector enhances recognition, achieving up to a 35% relative Word Error Rate (WER) reduction on AfriSpeech-200 across ten African accents with the Whisper tiny model.

Pseudo2Real: 自動音声認識における疑似ラベル補正のためのタスク演算

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

要旨

Support