체인-오브-딕셔너리 프롬프팅이 대규모 언어 모델에서 번역을 유도한다

초록

대규모 언어 모델(LLMs)은 병렬 데이터 없이 훈련된 경우에도 다국어 신경망 기계 번역(MNMT)에서 놀라울 정도로 우수한 성능을 보여왔습니다. 그러나 훈련 데이터의 양이 엄청나게 많음에도 불구하고, 특히 저자원 언어의 희귀 단어 번역에는 여전히 어려움을 겪고 있습니다. 더욱 심각한 문제는, 저자원 언어에 대한 컨텍스트 학습을 위한 관련 데모를 검색하는 것이 현실적으로 불가능한 경우가 많아, LLMs의 번역 실용성이 제한된다는 점입니다. 이 문제를 어떻게 완화할 수 있을까요? 이를 위해 우리는 CoD라는 새로운 방법을 제안합니다. CoD는 입력 단어의 일부에 대해 다국어 사전 체인을 활용하여 LLMs에 사전 지식을 추가함으로써 번역 능력을 이끌어냅니다. 광범위한 실험 결과, CoD를 통해 ChatGPT를 보강하면 FLORES-200 전체 개발 테스트 세트에서 MNMT의 ChrF++ 점수가 최대 13배(영어에서 키릴 문자로 작성된 세르비아어의 경우 3.08에서 42.63으로) 향상되는 것으로 나타났습니다. 또한, 다국어 사전 체인의 중요성과 저자원 언어에 대한 CoD의 소수 샷 데모 대비 우수성을 입증했습니다.

English

Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data. Yet, despite the fact that the amount of training data is gigantic, they still struggle with translating rare words, particularly for low-resource languages. Even worse, it is usually unrealistic to retrieve relevant demonstrations for in-context learning with low-resource languages on LLMs, which restricts the practical use of LLMs for translation -- how should we mitigate this problem? To this end, we present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. Extensive experiments indicate that augmenting ChatGPT with CoD elicits large gains by up to 13x ChrF++ points for MNMT (3.08 to 42.63 for English to Serbian written in Cyrillic script) on FLORES-200 full devtest set. We further demonstrate the importance of chaining the multilingual dictionaries, as well as the superiority of CoD to few-shot demonstration for low-resource languages.

체인-오브-딕셔너리 프롬프팅이 대규모 언어 모델에서 번역을 유도한다

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

초록

Support