기계 번역의 패러다임 전환: 대형 언어 모델의 번역 성능 향상

초록

생성형 대규모 언어 모델(LLM)은 다양한 자연어 처리(NLP) 작업에서 놀라운 발전을 이루어냈습니다. 그러나 이러한 발전은 번역 작업, 특히 중간 규모의 모델(예: 7B 또는 13B 파라미터)에서는 반영되지 않았으며, 이들은 여전히 기존의 지도 학습 기반 인코더-디코더 번역 모델에 뒤처져 있습니다. 이전 연구들은 이러한 중간 규모 LLM의 번역 능력을 향상시키려는 시도를 해왔지만, 그 성과는 제한적이었습니다. 본 연구에서는 번역 작업에 특화된 새로운 미세 조정(fine-tuning) 접근법을 제안하며, 이는 기존 번역 모델이 일반적으로 의존하는 대량의 병렬 데이터가 필요하지 않습니다. 우리의 접근법은 단일 언어 데이터에 대한 초기 미세 조정과 소량의 고품질 병렬 데이터에 대한 후속 미세 조정이라는 두 단계로 구성됩니다. 이 전략을 통해 개발된 LLM을 Advanced Language Model-based trAnslator(ALMA)로 명명합니다. LLaMA-2를 기본 모델로 사용한 결과, 이 모델은 WMT'21(2개 방향) 및 WMT'22(8개 방향) 테스트 데이터셋에서 10개 번역 방향에 걸쳐 제로샷(zero-shot) 성능 대비 평균 12 BLEU 및 12 COMET 이상의 향상을 달성할 수 있음을 보여줍니다. 이 성능은 모든 기존 연구를 크게 능가하며, 7B 또는 13B 파라미터만으로도 NLLB-54B 모델과 GPT-3.5-text-davinci-003보다 우수합니다. 이 방법은 기계 번역에서 새로운 훈련 패러다임의 기반을 마련합니다.

English

Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.

기계 번역의 패러다임 전환: 대형 언어 모델의 번역 성능 향상

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

초록

Support