ChatPaper.aiChatPaper

台风ASR实时版:面向泰语自动语音识别的快速Conformer-Transducer模型

Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

January 19, 2026
作者: Warit Sirichotedumrong, Adisai Na-Thalang, Potsawee Manakul, Pittawat Taveekitworachai, Sittipong Sripaisarnmongkol, Kunat Pipatanakul
cs.AI

摘要

诸如Whisper之类的大型编码器-解码器模型虽能实现强大的离线转录能力,但由于高延迟问题,在流式应用中仍不实用。尽管预训练模型易于获取,当前泰语自动语音识别领域仍由这些离线架构主导,导致高效流式解决方案存在关键空白。我们推出Typhoon ASR Real-time——一个115M参数的FastConformer-Transducer模型,专为低延迟泰语语音识别设计。研究表明,严格的文本规范化可达到模型扩增的效果:相比Whisper Large-v3,我们的紧凑模型在保持相当准确度的同时实现了45倍计算成本降低。我们的规范化流程解决了泰语转录中的系统性歧义问题(包括上下文相关的数字口语化处理和重复标记符mai yamok),从而创建了统一的训练目标。我们还提出针对伊森方言(泰国东北部)适配的两阶段课程学习方案,该方案能保持中部泰语的处理性能。为应对泰语ASR的可复现性挑战,我们发布了Typhoon ASR Benchmark——遵循标准泰语语言学规范的人工标注黄金数据集,为研究社区提供标准化评估协议。
English
Large encoder-decoder models like Whisper achieve strong offline transcription but remain impractical for streaming applications due to high latency. However, due to the accessibility of pre-trained checkpoints, the open Thai ASR landscape remains dominated by these offline architectures, leaving a critical gap in efficient streaming solutions. We present Typhoon ASR Real-time, a 115M-parameter FastConformer-Transducer model for low-latency Thai speech recognition. We demonstrate that rigorous text normalization can match the impact of model scaling: our compact model achieves a 45x reduction in computational cost compared to Whisper Large-v3 while delivering comparable accuracy. Our normalization pipeline resolves systemic ambiguities in Thai transcription --including context-dependent number verbalization and repetition markers (mai yamok) --creating consistent training targets. We further introduce a two-stage curriculum learning approach for Isan (north-eastern) dialect adaptation that preserves Central Thai performance. To address reproducibility challenges in Thai ASR, we release the Typhoon ASR Benchmark, a gold-standard human-labeled datasets with transcriptions following established Thai linguistic conventions, providing standardized evaluation protocols for the research community.
PDF111January 23, 2026