语音模型中的文本插入用于大写和交替预测

摘要

文本注入用于自动语音识别（ASR），即使用未配对的仅文本数据来补充配对的音频-文本数据，已经显示出对词错误率有着令人期待的改进。本研究探讨了文本注入用于辅助任务，这些任务通常由端到端（E2E）模型执行而非ASR任务。在这项工作中，我们使用联合端到端和内部语言模型训练（JEIT）作为我们的文本注入算法，来训练一个执行两个辅助任务的ASR模型。第一个任务是大写处理，这是一个去规范化的任务。第二个任务是轮次预测，旨在确定用户是否已经完成了数字助手交互中的对话轮次。我们展示了结果，表明我们的文本注入方法提升了长尾数据的大写处理性能，并改善了轮次检测的召回率。

English

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate. This study examines the use of text injection for auxiliary tasks, which are the non-ASR tasks often performed by an E2E model. In this work, we use joint end-to-end and internal language model training (JEIT) as our text injection algorithm to train an ASR model which performs two auxiliary tasks. The first is capitalization, which is a de-normalization task. The second is turn-taking prediction, which attempts to identify whether a user has completed their conversation turn in a digital assistant interaction. We show results demonstrating that our text injection method boosts capitalization performance for long-tail data, and improves turn-taking detection recall.

语音模型中的文本插入用于大写和交替预测

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

摘要

Support