面向应用的语言建模技术在设备端自动语音识别中的应用

摘要

与基于服务器的系统相比，设备上的自动语音识别系统面临着几个挑战。它们必须在速度、磁盘大小和内存方面满足更严格的约束条件，同时保持相同的准确性。通常，它们必须同时为多个具有不同分布的应用提供服务，比如与虚拟助手和语音转文本进行通信。为多个应用提供服务的最简单解决方案是构建特定于应用的（语言）模型，但这会增加内存占用。因此，我们探索了不同的数据驱动和架构驱动的语言建模方法，以构建一个单一的应用无关模型。我们提出了两种新颖的前馈架构，找到了在设备上不同约束之间的最佳折衷方案。与特定于应用的解决方案相比，我们的一种新方法将磁盘大小减少了一半，同时保持了原始模型的速度和准确性。

English

On-device automatic speech recognition systems face several challenges compared to server-based systems. They have to meet stricter constraints in terms of speed, disk size and memory while maintaining the same accuracy. Often they have to serve several applications with different distributions at once, such as communicating with a virtual assistant and speech-to-text. The simplest solution to serve multiple applications is to build application-specific (language) models, but this leads to an increase in memory. Therefore, we explore different data- and architecture-driven language modeling approaches to build a single application-agnostic model. We propose two novel feed-forward architectures that find an optimal trade off between different on-device constraints. In comparison to the application-specific solution, one of our novel approaches reduces the disk size by half, while maintaining speed and accuracy of the original model.

面向应用的语言建模技术在设备端自动语音识别中的应用

Application-Agnostic Language Modeling for On-Device ASR

摘要

Support