REINA: 효율적인 동시 음성 번역을 위한 정규화 엔트로피 정보 기반 손실 함수

초록

동시 음성 번역(SimulST) 시스템은 오디오를 스트리밍하면서 동시에 번역된 텍스트나 음성을 출력한다. 이러한 시스템은 번역 품질과 지연 시간 간의 균형을 맞추는 중요한 과제에 직면해 있다. 본 연구에서는 이러한 균형을 최적화하기 위한 전략을 제안한다: 추가 입력을 통해 정보를 얻을 수 있는 경우에만 더 많은 입력을 기다리는 것이다. 이 전략을 바탕으로, 기존의 비스트리밍 번역 모델을 사용하여 적응형 정책을 학습하기 위한 새로운 손실 함수인 정규화 엔트로피 정보 적응(REINA)을 제안한다. REINA은 정보 이론 원칙에서 도출되었으며, REINA이 기존 연구들보다 지연 시간/품질 간의 파레토 최적 경계를 개선하는 데 도움을 준다는 것을 보여준다. REINA을 활용하여, 프랑스어, 스페인어, 독일어와 영어 간의 양방향 SimulST 모델을 학습시켰다. 오픈 소스 또는 합성 데이터만을 사용하여 학습한 결과, 비슷한 규모의 모델들 중에서 최첨단 스트리밍 성능을 달성했다. 또한, 스트리밍 효율성을 측정하기 위한 새로운 지표를 도입하여, REINA이 기존 접근법에 비해 지연 시간/품질 간의 균형을 최대 21%까지 개선함을 정량적으로 보여주었다. 이는 비스트리밍 기준 BLEU 점수에 대해 정규화된 결과이다.

English

Simultaneous Speech Translation (SimulST) systems stream in audio while simultaneously emitting translated text or speech. Such systems face the significant challenge of balancing translation quality and latency. We introduce a strategy to optimize this tradeoff: wait for more input only if you gain information by doing so. Based on this strategy, we present Regularized Entropy INformation Adaptation (REINA), a novel loss to train an adaptive policy using an existing non-streaming translation model. We derive REINA from information theory principles and show that REINA helps push the reported Pareto frontier of the latency/quality tradeoff over prior works. Utilizing REINA, we train a SimulST model on French, Spanish and German, both from and into English. Training on only open source or synthetically generated data, we achieve state-of-the-art (SOTA) streaming results for models of comparable size. We also introduce a metric for streaming efficiency, quantitatively showing REINA improves the latency/quality trade-off by as much as 21% compared to prior approaches, normalized against non-streaming baseline BLEU scores.

REINA: 효율적인 동시 음성 번역을 위한 정규화 엔트로피 정보 기반 손실 함수

REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation

초록

Support