온디바이스 음성 인식을 위한 애플리케이션 독립적 언어 모델링

초록

온디바이스 자동 음성 인식 시스템은 서버 기반 시스템과 비교하여 여러 가지 과제에 직면합니다. 이러한 시스템은 동일한 정확도를 유지하면서 속도, 디스크 크기 및 메모리 측면에서 더 엄격한 제약 조건을 충족해야 합니다. 또한 종종 가상 어시스턴트와의 통신 및 음성-텍스트 변환과 같이 서로 다른 분포를 가진 여러 애플리케이션을 동시에 처리해야 합니다. 여러 애플리케이션을 처리하는 가장 간단한 해결책은 애플리케이션별 (언어) 모델을 구축하는 것이지만, 이는 메모리 사용량 증가로 이어집니다. 따라서 우리는 단일 애플리케이션-불특정 모델을 구축하기 위해 다양한 데이터 및 아키텍처 기반 언어 모델링 접근 방식을 탐구합니다. 우리는 온디바이스 제약 조건 사이에서 최적의 균형을 찾는 두 가지 새로운 피드포워드 아키텍처를 제안합니다. 애플리케이션별 해결책과 비교했을 때, 우리의 새로운 접근 방식 중 하나는 원본 모델의 속도와 정확도를 유지하면서 디스크 크기를 절반으로 줄입니다.

English

On-device automatic speech recognition systems face several challenges compared to server-based systems. They have to meet stricter constraints in terms of speed, disk size and memory while maintaining the same accuracy. Often they have to serve several applications with different distributions at once, such as communicating with a virtual assistant and speech-to-text. The simplest solution to serve multiple applications is to build application-specific (language) models, but this leads to an increase in memory. Therefore, we explore different data- and architecture-driven language modeling approaches to build a single application-agnostic model. We propose two novel feed-forward architectures that find an optimal trade off between different on-device constraints. In comparison to the application-specific solution, one of our novel approaches reduces the disk size by half, while maintaining speed and accuracy of the original model.

온디바이스 음성 인식을 위한 애플리케이션 독립적 언어 모델링

Application-Agnostic Language Modeling for On-Device ASR

초록

Support