辞書連鎖プロンプティングは大規模言語モデルにおける翻訳を誘発する

要旨

大規模言語モデル（LLM）は、並列データなしで学習した場合でも、多言語ニューラル機械翻訳（MNMT）において驚くほど優れた性能を示しています。しかし、学習データの量が膨大であるにもかかわらず、特に低リソース言語において、稀な単語の翻訳に苦戦しています。さらに悪いことに、低リソース言語におけるLLMの文脈内学習のための関連デモンストレーションを取得することは通常非現実的であり、これがLLMの翻訳における実用的な使用を制限しています。この問題をどのように緩和すべきでしょうか？この目的のために、我々は新しい手法CoDを提案します。CoDは、入力単語のサブセットに対して多言語辞書の連鎖を用いてLLMに事前知識を付与し、LLMの翻訳能力を引き出します。大規模な実験により、ChatGPTにCoDを適用することで、MNMTにおいて最大13倍のChrF++スコアの向上（FLORES-200の完全開発テストセットにおけるキリル文字表記のセルビア語への英語翻訳で3.08から42.63）が得られることが示されました。さらに、多言語辞書の連鎖の重要性、および低リソース言語におけるCoDのfew-shotデモンストレーションに対する優位性を実証しました。

English

Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data. Yet, despite the fact that the amount of training data is gigantic, they still struggle with translating rare words, particularly for low-resource languages. Even worse, it is usually unrealistic to retrieve relevant demonstrations for in-context learning with low-resource languages on LLMs, which restricts the practical use of LLMs for translation -- how should we mitigate this problem? To this end, we present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. Extensive experiments indicate that augmenting ChatGPT with CoD elicits large gains by up to 13x ChrF++ points for MNMT (3.08 to 42.63 for English to Serbian written in Cyrillic script) on FLORES-200 full devtest set. We further demonstrate the importance of chaining the multilingual dictionaries, as well as the superiority of CoD to few-shot demonstration for low-resource languages.

辞書連鎖プロンプティングは大規模言語モデルにおける翻訳を誘発する

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

要旨

Support