向前失败:利用合成数据和检索增强改进ASR的生成式错误校正
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
October 17, 2024
作者: Sreyan Ghosh, Mohammad Sadegh Rasooli, Michael Levit, Peidong Wang, Jian Xue, Dinesh Manocha, Jinyu Li
cs.AI
摘要
生成式错误校正(GEC)已经成为增强自动语音识别(ASR)系统性能的强大后处理方法。然而,我们发现GEC模型在训练过程中遇到的特定类型错误之外的泛化能力有限,限制了其在测试时纠正新的、未见过的错误的能力,特别是在域外(OOD)场景中。这种现象在命名实体(NEs)方面尤为明显,除了对NEs缺乏上下文信息或知识外,新颖的NEs不断出现。为了解决这些问题,我们提出了DARAG(数据和检索增强生成式错误校正),这是一种旨在改进ASR中域内(ID)和OOD场景下GEC的新方法。我们通过使用提示LLMs和文本到语音模型生成的合成数据来增强GEC训练数据集,从而模拟模型可以学习的额外错误。对于OOD场景,我们以类似的方式并且无监督地模拟来自新领域的测试时错误。此外,为了更好地处理命名实体,我们引入了检索增强校正,通过从数据库中检索到的实体来增强输入。我们的方法简单、可扩展,既与领域无关,也与语言无关。我们在多个数据集和设置上进行实验,结果显示DARAG优于所有基线方法,在ID设置中相对WER改进了8%至30%,在OOD设置中改进了10%至33%。
English
Generative Error Correction (GEC) has emerged as a powerful post-processing
method to enhance the performance of Automatic Speech Recognition (ASR)
systems. However, we show that GEC models struggle to generalize beyond the
specific types of errors encountered during training, limiting their ability to
correct new, unseen errors at test time, particularly in out-of-domain (OOD)
scenarios. This phenomenon amplifies with named entities (NEs), where, in
addition to insufficient contextual information or knowledge about the NEs,
novel NEs keep emerging. To address these issues, we propose DARAG (Data- and
Retrieval-Augmented Generative Error Correction), a novel approach designed to
improve GEC for ASR in in-domain (ID) and OOD scenarios. We augment the GEC
training dataset with synthetic data generated by prompting LLMs and
text-to-speech models, thereby simulating additional errors from which the
model can learn. For OOD scenarios, we simulate test-time errors from new
domains similarly and in an unsupervised fashion. Additionally, to better
handle named entities, we introduce retrieval-augmented correction by
augmenting the input with entities retrieved from a database. Our approach is
simple, scalable, and both domain- and language-agnostic. We experiment on
multiple datasets and settings, showing that DARAG outperforms all our
baselines, achieving 8\% -- 30\% relative WER improvements in ID and 10\% --
33\% improvements in OOD settings.Summary
AI-Generated Summary