在一堆草堆中尋找針:偶發性雙語能力在PaLM的翻譯能力中的作用
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability
May 17, 2023
作者: Eleftheria Briakou, Colin Cherry, George Foster
cs.AI
摘要
大型、多語言語言模型展現了出乎意料的良好的零翻譯或少翻譯機器翻譯能力,儘管從未見過提供給典型神經翻譯系統的有意包含的翻譯範例。我們研究了偶發性雙語能力的作用——即無意中消耗雙語信號,包括翻譯範例——以解釋大型語言模型翻譯能力的情況,以 Pathways 語言模型(PaLM)作為案例研究。我們引入了一種混合方法來在規模上測量和理解偶發性雙語能力。我們展示 PaLM 暴露於至少 44 種語言的超過 3,000 萬個翻譯對。此外,偶發性雙語內容的量與非英語語言的單語內容量高度相關。我們將偶發性雙語內容與零翻譯提示相關聯,並展示它可用於挖掘新提示以改善 PaLM 在英語以外的零翻譯質量。最後,在一系列小規模的消融實驗中,我們展示其存在對翻譯能力有顯著影響,儘管這種影響隨著模型規模的增大而減弱。
English
Large, multilingual language models exhibit surprisingly good zero- or
few-shot machine translation capabilities, despite having never seen the
intentionally-included translation examples provided to typical neural
translation systems. We investigate the role of incidental bilingualism -- the
unintentional consumption of bilingual signals, including translation examples
-- in explaining the translation capabilities of large language models, taking
the Pathways Language Model (PaLM) as a case study. We introduce a mixed-method
approach to measure and understand incidental bilingualism at scale. We show
that PaLM is exposed to over 30 million translation pairs across at least 44
languages. Furthermore, the amount of incidental bilingual content is highly
correlated with the amount of monolingual in-language content for non-English
languages. We relate incidental bilingual content to zero-shot prompts and show
that it can be used to mine new prompts to improve PaLM's out-of-English
zero-shot translation quality. Finally, in a series of small-scale ablations,
we show that its presence has a substantial impact on translation capabilities,
although this impact diminishes with model scale.