適用於設備端自動語音識別的應用無關語言建模

摘要

在裝置上的自動語音識別系統與基於伺服器的系統相比，面臨著幾個挑戰。它們必須在速度、磁碟大小和記憶體方面符合更嚴格的限制，同時保持相同的準確性。通常，它們必須同時為多個具有不同分佈的應用提供服務，例如與虛擬助手和語音轉文字進行通信。為多個應用構建特定的（語言）模型是最簡單的解決方案，但這將導致記憶體增加。因此，我們探索了不同的數據驅動和架構驅動的語言建模方法，以構建單一應用無關的模型。我們提出了兩種新穎的前饋架構，找到了在裝置上不同限制之間的最佳折衷方案。與特定應用解決方案相比，我們的其中一種新方法將磁碟大小減少了一半，同時保持了原始模型的速度和準確性。

English

On-device automatic speech recognition systems face several challenges compared to server-based systems. They have to meet stricter constraints in terms of speed, disk size and memory while maintaining the same accuracy. Often they have to serve several applications with different distributions at once, such as communicating with a virtual assistant and speech-to-text. The simplest solution to serve multiple applications is to build application-specific (language) models, but this leads to an increase in memory. Therefore, we explore different data- and architecture-driven language modeling approaches to build a single application-agnostic model. We propose two novel feed-forward architectures that find an optimal trade off between different on-device constraints. In comparison to the application-specific solution, one of our novel approaches reduces the disk size by half, while maintaining speed and accuracy of the original model.

適用於設備端自動語音識別的應用無關語言建模

Application-Agnostic Language Modeling for On-Device ASR

摘要

Support