REINA:基於正則化熵信息的高效同步語音翻譯損失函數
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
August 7, 2025
作者: Nameer Hirschkind, Joseph Liu, Mahesh Kumar Nandwana, Xiao Yu
cs.AI
摘要
同步語音翻譯(SimulST)系統在接收音頻輸入的同時,即時輸出翻譯後的文本或語音。此類系統面臨著在翻譯質量與延遲之間取得平衡的重大挑戰。我們提出了一種優化此權衡的策略:僅在等待更多輸入能獲取額外信息時才進行等待。基於此策略,我們介紹了正則化熵信息適應(REINA),這是一種利用現有非流式翻譯模型來訓練自適應策略的新穎損失函數。我們從信息論原理推導出REINA,並展示REINA有助於將報告的延遲/質量權衡的帕累托前沿推至先前工作之上。運用REINA,我們訓練了一個SimulST模型,涵蓋法語、西班牙語和德語與英語之間的雙向翻譯。僅使用開源或合成生成的數據進行訓練,我們在同等規模的模型中實現了最先進(SOTA)的流式翻譯效果。此外,我們引入了一種流式效率的度量標準,定量顯示REINA相比於先前方法,在延遲/質量權衡上提升了多達21%,這一提升是相對於非流式基線BLEU分數進行標準化後的結果。
English
Simultaneous Speech Translation (SimulST) systems stream in audio while
simultaneously emitting translated text or speech. Such systems face the
significant challenge of balancing translation quality and latency. We
introduce a strategy to optimize this tradeoff: wait for more input only if you
gain information by doing so. Based on this strategy, we present Regularized
Entropy INformation Adaptation (REINA), a novel loss to train an adaptive
policy using an existing non-streaming translation model. We derive REINA from
information theory principles and show that REINA helps push the reported
Pareto frontier of the latency/quality tradeoff over prior works. Utilizing
REINA, we train a SimulST model on French, Spanish and German, both from and
into English. Training on only open source or synthetically generated data, we
achieve state-of-the-art (SOTA) streaming results for models of comparable
size. We also introduce a metric for streaming efficiency, quantitatively
showing REINA improves the latency/quality trade-off by as much as 21% compared
to prior approaches, normalized against non-streaming baseline BLEU scores.