ChatPaper.aiChatPaper

遲到總比不到好:同步語音轉文字翻譯延遲指標之評估

Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation

September 22, 2025
作者: Peter Polák, Sara Papi, Luisa Bentivogli, Ondřej Bojar
cs.AI

摘要

同步語音到文本翻譯(SimulST)系統必須在翻譯質量與延遲——即語音輸入與翻譯輸出之間的時間差——之間取得平衡。雖然質量評估已相當成熟,但精確的延遲測量仍是一大挑戰。現有的度量標準往往產生不一致或誤導性的結果,特別是在廣泛使用的短格式設置中,其中語音被人為地預先分段。本文首次對跨語言對、系統以及短格式和長格式情境下的SimulST延遲度量進行了全面分析。我們揭示了當前度量標準中與分段相關的結構性偏見,這損害了公平且有意義的比較。為解決這一問題,我們引入了YAAL(Yet Another Average Lagging),這是一種改進的延遲度量,能在短格式情境下提供更準確的評估。我們將YAAL擴展為LongYAAL,用於未分段音頻,並提出了SoftSegmenter,這是一種基於詞級對齊的新型重分段工具。我們的實驗表明,YAAL和LongYAAL在性能上超越了流行的延遲度量標準,而SoftSegmenter則提升了長格式評估中的對齊質量,共同促成了對SimulST系統更為可靠的評估。
English
Simultaneous speech-to-text translation (SimulST) systems have to balance translation quality with latency--the delay between speech input and the translated output. While quality evaluation is well established, accurate latency measurement remains a challenge. Existing metrics often produce inconsistent or misleading results, especially in the widely used short-form setting, where speech is artificially presegmented. In this paper, we present the first comprehensive analysis of SimulST latency metrics across language pairs, systems, and both short- and long-form regimes. We uncover a structural bias in current metrics related to segmentation that undermines fair and meaningful comparisons. To address this, we introduce YAAL (Yet Another Average Lagging), a refined latency metric that delivers more accurate evaluations in the short-form regime. We extend YAAL to LongYAAL for unsegmented audio and propose SoftSegmenter, a novel resegmentation tool based on word-level alignment. Our experiments show that YAAL and LongYAAL outperform popular latency metrics, while SoftSegmenter enhances alignment quality in long-form evaluation, together enabling more reliable assessments of SimulST systems.
PDF22September 24, 2025