ChatPaper.aiChatPaper

迟到总比不到好:同步语音转文本翻译延迟指标评估

Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation

September 22, 2025
作者: Peter Polák, Sara Papi, Luisa Bentivogli, Ondřej Bojar
cs.AI

摘要

同步语音到文本翻译(SimulST)系统需要在翻译质量与延迟——即语音输入与翻译输出之间的时间差——之间取得平衡。尽管质量评估已有成熟方法,但精确测量延迟仍是一大挑战。现有指标往往产生不一致或误导性的结果,尤其是在广泛使用的短格式场景中,语音被人为预先分割。本文首次对跨语言对、系统以及短格式与长格式场景下的SimulST延迟指标进行了全面分析,揭示了当前指标中与分割相关的结构性偏差,这一偏差影响了公平且有意义的比较。为解决此问题,我们引入了YAAL(Yet Another Average Lagging),一种在短格式场景下提供更准确评估的改进延迟指标。我们将YAAL扩展为LongYAAL以适用于未分割音频,并提出SoftSegmenter,一种基于词级对齐的新型重分割工具。实验表明,YAAL和LongYAAL在延迟指标上优于流行方法,而SoftSegmenter提升了长格式评估中的对齐质量,共同为SimulST系统提供了更可靠的评估手段。
English
Simultaneous speech-to-text translation (SimulST) systems have to balance translation quality with latency--the delay between speech input and the translated output. While quality evaluation is well established, accurate latency measurement remains a challenge. Existing metrics often produce inconsistent or misleading results, especially in the widely used short-form setting, where speech is artificially presegmented. In this paper, we present the first comprehensive analysis of SimulST latency metrics across language pairs, systems, and both short- and long-form regimes. We uncover a structural bias in current metrics related to segmentation that undermines fair and meaningful comparisons. To address this, we introduce YAAL (Yet Another Average Lagging), a refined latency metric that delivers more accurate evaluations in the short-form regime. We extend YAAL to LongYAAL for unsegmented audio and propose SoftSegmenter, a novel resegmentation tool based on word-level alignment. Our experiments show that YAAL and LongYAAL outperform popular latency metrics, while SoftSegmenter enhances alignment quality in long-form evaluation, together enabling more reliable assessments of SimulST systems.
PDF22September 24, 2025