ChatPaper.aiChatPaper

泰语语义对话轮结束检测在实时语音助手中的应用

Thai Semantic End-of-Turn Detection for Real-Time Voice Agents

October 5, 2025
作者: Thanapol Popit, Natthapath Rungseesiripak, Monthol Charattrakool, Saksorn Ruangtanusak
cs.AI

摘要

流畅的语音交互需要可靠且低延迟地检测用户何时结束发言。传统的音频静音端点检测方法会引入数百毫秒的延迟,且在犹豫或特定语言现象下表现不佳。据我们所知,我们首次系统性地研究了面向实时智能体的泰语纯文本话轮结束(EOT)检测。我们对比了紧凑型大语言模型的零样本与少样本提示方法,以及轻量级Transformer模型的监督微调。利用YODAS语料库中的转录字幕和泰语特有的语言线索(如句末助词),我们将EOT问题转化为对词元边界的二元决策。我们报告了明显的准确率-延迟权衡,并提供了一个可直接公开使用的实施方案。本研究确立了泰语EOT检测的基准,并证明经过微调的小型模型能够提供近乎即时的话轮结束判断,适用于设备端智能体。
English
Fluid voice-to-voice interaction requires reliable and low-latency detection of when a user has finished speaking. Traditional audio-silence end-pointers add hundreds of milliseconds of delay and fail under hesitations or language-specific phenomena. We present, to our knowledge, the first systematic study of Thai text-only end-of-turn (EOT) detection for real-time agents. We compare zero-shot and few-shot prompting of compact LLMs to supervised fine-tuning of lightweight transformers. Using transcribed subtitles from the YODAS corpus and Thai-specific linguistic cues (e.g., sentence-final particles), we formulate EOT as a binary decision over token boundaries. We report a clear accuracy-latency tradeoff and provide a public-ready implementation plan. This work establishes a Thai baseline and demonstrates that small, fine-tuned models can deliver near-instant EOT decisions suitable for on-device agents.
PDF32October 7, 2025