语音模型中的文本插入用于大写和交替预测
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
August 14, 2023
作者: Shaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath
cs.AI
摘要
文本注入用于自动语音识别(ASR),即使用未配对的仅文本数据来补充配对的音频-文本数据,已经显示出对词错误率有着令人期待的改进。本研究探讨了文本注入用于辅助任务,这些任务通常由端到端(E2E)模型执行而非ASR任务。在这项工作中,我们使用联合端到端和内部语言模型训练(JEIT)作为我们的文本注入算法,来训练一个执行两个辅助任务的ASR模型。第一个任务是大写处理,这是一个去规范化的任务。第二个任务是轮次预测,旨在确定用户是否已经完成了数字助手交互中的对话轮次。我们展示了结果,表明我们的文本注入方法提升了长尾数据的大写处理性能,并改善了轮次检测的召回率。
English
Text injection for automatic speech recognition (ASR), wherein unpaired
text-only data is used to supplement paired audio-text data, has shown
promising improvements for word error rate. This study examines the use of text
injection for auxiliary tasks, which are the non-ASR tasks often performed by
an E2E model. In this work, we use joint end-to-end and internal language model
training (JEIT) as our text injection algorithm to train an ASR model which
performs two auxiliary tasks. The first is capitalization, which is a
de-normalization task. The second is turn-taking prediction, which attempts to
identify whether a user has completed their conversation turn in a digital
assistant interaction. We show results demonstrating that our text injection
method boosts capitalization performance for long-tail data, and improves
turn-taking detection recall.