ChatPaper.aiChatPaper

颱風ASR即時系統:基於FastConformer-Transducer架構的泰語自動語音辨識 (注:根據技術文件翻譯慣例,"Typhoon ASR Real-time"採用意譯結合專有名詞保留的譯法,FastConformer-Transducer為業界公認架構名稱保留原文,Thai Automatic Speech Recognition按技術領域標準譯為「泰語自動語音辨識」)

Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

January 19, 2026
作者: Warit Sirichotedumrong, Adisai Na-Thalang, Potsawee Manakul, Pittawat Taveekitworachai, Sittipong Sripaisarnmongkol, Kunat Pipatanakul
cs.AI

摘要

諸如Whisper之類的大型編碼器-解碼器模型雖能實現強大的離線語音轉寫能力,但因高延遲問題在串流應用中仍不具實用性。儘管預訓練模型易於取得,當前泰語自動語音辨識領域仍由這類離線架構主導,導致高效能串流解決方案存在關鍵缺口。我們提出Typhoon ASR Real-time——一款參數量達1.15億的FastConformer-Transducer模型,專為低延遲泰語語音辨識設計。我們證實嚴謹的文本正規化能達到與模型擴充相當的效果:相較Whisper Large-v3,我們的小型模型在保持相近準確度的同時,實現了45倍的運算成本降低。我們的正規化流程解決了泰語轉寫中的系統性歧義問題(包括情境依賴型數字口語化與重複標記「ไม้ยมก」),從而建立一致的訓練目標。此外,我們針對伊森(東北方言)適應性提出兩階段課程學習法,在維持中部泰語辨識效能的同時完成方言遷移。為解決泰語語音辨識的可重現性挑戰,我們發布Typhoon ASR Benchmark黃金標準人工標註資料集,其轉寫內容遵循既定泰語語言學規範,為研究社群提供標準化評估框架。
English
Large encoder-decoder models like Whisper achieve strong offline transcription but remain impractical for streaming applications due to high latency. However, due to the accessibility of pre-trained checkpoints, the open Thai ASR landscape remains dominated by these offline architectures, leaving a critical gap in efficient streaming solutions. We present Typhoon ASR Real-time, a 115M-parameter FastConformer-Transducer model for low-latency Thai speech recognition. We demonstrate that rigorous text normalization can match the impact of model scaling: our compact model achieves a 45x reduction in computational cost compared to Whisper Large-v3 while delivering comparable accuracy. Our normalization pipeline resolves systemic ambiguities in Thai transcription --including context-dependent number verbalization and repetition markers (mai yamok) --creating consistent training targets. We further introduce a two-stage curriculum learning approach for Isan (north-eastern) dialect adaptation that preserves Central Thai performance. To address reproducibility challenges in Thai ASR, we release the Typhoon ASR Benchmark, a gold-standard human-labeled datasets with transcriptions following established Thai linguistic conventions, providing standardized evaluation protocols for the research community.
PDF111January 23, 2026