デコーダ限定を超えて：大規模言語モデルは機械翻訳の優れたエンコーダとなり得る

要旨

ニューラル機械翻訳（NMT）の分野は、大規模言語モデル（LLM）の登場によって変化してきた。近年の自然言語処理（NLP）では、機械翻訳やその他の多くの問題を単一の事前学習済みTransformerデコーダでモデル化することに重点が置かれており、従来のNMTモデルで標準的だったエンコーダ-デコーダアーキテクチャは比較的注目を集めていない。本論文では、LLMの世界とNMTの世界を融合させることで、普遍的で効率的かつ最適化が容易な翻訳モデルを探求する。我々はLLMをNMTのエンコーディングに適用し、NMTデコーダはそのまま残す。また、LLMをNMTデコーダとより良く連携させるための適応手法を開発する。さらに、機械翻訳システムが様々なタスクにわたってどれだけ汎化するかを評価するために、複数のタスクを含む新しいデータセットを構築する。WMTおよび我々のデータセットでの評価では、我々の手法を用いた結果が翻訳品質の面で一連のベースラインと同等またはそれを上回りつつ、推論速度が2.4～6.5倍向上し、KVキャッシュのメモリ使用量が75%削減されることが示された。また、翻訳関連の様々なタスクにわたって強い汎化性能を示すことも実証された。

English

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.

デコーダ限定を超えて：大規模言語モデルは機械翻訳の優れたエンコーダとなり得る

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

要旨

Support