REINA:基于正则化熵信息的高效同步语音翻译损失函数
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
August 7, 2025
作者: Nameer Hirschkind, Joseph Liu, Mahesh Kumar Nandwana, Xiao Yu
cs.AI
摘要
同步语音翻译(SimulST)系统在接收音频流的同时,实时输出翻译文本或语音。这类系统面临的一大挑战是如何在翻译质量与延迟之间取得平衡。为此,我们提出了一种优化策略:仅在等待更多输入能带来信息增益时,才延迟输出。基于这一策略,我们引入了正则化熵信息适应(REINA),这是一种利用现有非流式翻译模型训练自适应策略的新颖损失函数。REINA源自信息论原理,其应用显著提升了先前工作中关于延迟与质量权衡的帕累托前沿。通过REINA,我们训练了法语、西班牙语和德语与英语之间的双向SimulST模型。仅使用开源或合成生成的数据进行训练,我们实现了与同类规模模型相比的当前最优(SOTA)流式翻译效果。此外,我们还引入了一种流式效率度量指标,定量分析表明,相较于之前的方法,REINA在延迟与质量权衡上的改进幅度高达21%,这一结果已针对非流式基线BLEU得分进行了标准化处理。
English
Simultaneous Speech Translation (SimulST) systems stream in audio while
simultaneously emitting translated text or speech. Such systems face the
significant challenge of balancing translation quality and latency. We
introduce a strategy to optimize this tradeoff: wait for more input only if you
gain information by doing so. Based on this strategy, we present Regularized
Entropy INformation Adaptation (REINA), a novel loss to train an adaptive
policy using an existing non-streaming translation model. We derive REINA from
information theory principles and show that REINA helps push the reported
Pareto frontier of the latency/quality tradeoff over prior works. Utilizing
REINA, we train a SimulST model on French, Spanish and German, both from and
into English. Training on only open source or synthetically generated data, we
achieve state-of-the-art (SOTA) streaming results for models of comparable
size. We also introduce a metric for streaming efficiency, quantitatively
showing REINA improves the latency/quality trade-off by as much as 21% compared
to prior approaches, normalized against non-streaming baseline BLEU scores.