ChatPaper.aiChatPaper

月光之味:面向边缘设备的微型专用ASR模型

Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices

September 2, 2025
作者: Evan King, Adam Sabra, Manjunath Kudlur, James Wang, Pete Warden
cs.AI

摘要

我们推出“月光系列”,这是一套专为多种代表性不足语言设计的微型自动语音识别(ASR)模型。传统观点认为,多语言ASR模型通过利用跨语言的语音相似性,性能优于单语言模型。我们对此假设提出挑战,证明对于足够小的模型(2700万参数),在精心平衡的高质量人工标注、伪标注及合成数据上训练单语言系统,能显著提升性能。平均而言,我们的模型错误率比同等规模的Whisper Tiny模型低48%,超越规模大9倍的Whisper Small模型,并在多数情况下与规模大28倍的Whisper Medium模型持平或更优。这些成果推动了该尺寸模型的技术前沿,为之前支持有限的语言实现了精确的端侧ASR。我们以宽松的开源许可证发布了阿拉伯语、中文、日语、韩语、乌克兰语和越南语的“月光”模型。
English
We present the Flavors of Moonshine, a suite of tiny automatic speech recognition (ASR) models specialized for a range of underrepresented languages. Prevailing wisdom suggests that multilingual ASR models outperform monolingual counterparts by exploiting cross-lingual phonetic similarities. We challenge this assumption, showing that for sufficiently small models (27M parameters), training monolingual systems on a carefully balanced mix of high-quality human-labeled, pseudo-labeled, and synthetic data yields substantially superior performance. On average, our models achieve error rates 48% lower than the comparably sized Whisper Tiny model, outperform the 9x larger Whisper Small model, and in most cases match or outperform the 28x larger Whisper Medium model. These results advance the state of the art for models of this size, enabling accurate on-device ASR for languages that previously had limited support. We release Arabic, Chinese, Japanese, Korean, Ukrainian, and Vietnamese Moonshine models under a permissive open-source license.
PDF21September 3, 2025