디코더 전용 모델을 넘어서: 대규모 언어 모델이 기계 번역을 위한 우수한 인코더가 될 수 있다

초록

신경망 기계 번역(NMT) 분야는 대규모 언어 모델(LLM)의 등장으로 변화를 겪었습니다. 최근 자연어 처리(NLP) 분야에서는 단일 사전 학습된 트랜스포머 디코더를 사용하여 기계 번역 및 다양한 문제를 모델링하는 데 초점이 맞춰져 왔으며, 이전 NMT 모델에서 표준이었던 인코더-디코더 아키텍처는 상대적으로 덜 주목받아 왔습니다. 본 논문에서는 LLM의 세계와 NMT의 세계를 결합하여 보편적이고 효율적이며 최적화가 쉬운 번역 모델을 탐구합니다. 우리는 LLM을 NMT 인코딩에 적용하고 NMT 디코더는 그대로 유지합니다. 또한 LLM이 NMT 디코더와 더 잘 작동하도록 조정하는 방법을 개발합니다. 더 나아가, 기계 번역 시스템이 다양한 작업에서 얼마나 잘 일반화되는지 평가하기 위해 여러 작업을 포함한 새로운 데이터셋을 구축합니다. WMT 및 우리의 데이터셋에 대한 평가 결과, 우리의 방법을 사용한 결과는 번역 품질 측면에서 다양한 기준선과 동등하거나 이를 능가하는 동시에 추론 속도가 2.4~6.5배 빨라지고 KV 캐시의 메모리 사용량이 75% 감소함을 보여줍니다. 또한 이 방법은 다양한 번역 관련 작업에서 강력한 일반화 능력을 입증합니다.

English

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.

디코더 전용 모델을 넘어서: 대규모 언어 모델이 기계 번역을 위한 우수한 인코더가 될 수 있다

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

초록

Support