超越僅解碼器架構：大型語言模型可成為機器翻譯的優良編碼器

摘要

隨著大型語言模型（LLMs）的出現，神經機器翻譯（NMT）領域發生了變化。近年來，自然語言處理（NLP）的重點大多集中在使用單一預訓練的Transformer解碼器來建模機器翻譯及其他多種問題，而早期NMT模型中作為標準的編碼器-解碼器架構則相對受到較少關注。本文探討了如何將LLMs與NMT結合，構建通用、高效且易於優化的翻譯模型。我們將LLMs應用於NMT編碼，並保持NMT解碼器不變。同時，我們開發了使LLMs更好地與NMT解碼器協同工作的方法。此外，我們構建了一個包含多任務的新數據集，以評估機器翻譯系統在各種任務中的泛化能力。在WMT及我們數據集上的評估結果顯示，使用我們方法在翻譯質量上與多種基線模型相當或更優，且推理速度提升了2.4至6.5倍，KV緩存的內存佔用減少了75%。該方法還展現了在各種翻譯相關任務中的強大泛化能力。

English

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.

超越僅解碼器架構：大型語言模型可成為機器翻譯的優良編碼器

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

摘要

Support