在一堆草堆中搜寻针头:偶发双语能力在PaLM翻译能力中的作用
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability
May 17, 2023
作者: Eleftheria Briakou, Colin Cherry, George Foster
cs.AI
摘要
大型多语言语言模型表现出令人惊讶的零或少样本机器翻译能力,尽管从未见过典型神经翻译系统提供的有意包含的翻译示例。我们研究了偶发双语能力的作用——即无意中消耗双语信号,包括翻译示例,以解释大型语言模型的翻译能力,以 Pathways 语言模型(PaLM)为案例研究。我们引入了一种混合方法来衡量和理解规模化的偶发双语能力。我们展示 PaLM 暴露于至少 44 种语言的超过 3000 万个翻译对。此外,偶发双语内容的数量与非英语语言的单语内容量高度相关。我们将偶发双语内容与零样本提示相关联,并显示它可用于挖掘新提示,以提高 PaLM 在英语之外的零样本翻译质量。最后,在一系列小规模消融实验中,我们展示了其存在对翻译能力有重大影响,尽管这种影响随着模型规模的增大而减弱。
English
Large, multilingual language models exhibit surprisingly good zero- or
few-shot machine translation capabilities, despite having never seen the
intentionally-included translation examples provided to typical neural
translation systems. We investigate the role of incidental bilingualism -- the
unintentional consumption of bilingual signals, including translation examples
-- in explaining the translation capabilities of large language models, taking
the Pathways Language Model (PaLM) as a case study. We introduce a mixed-method
approach to measure and understand incidental bilingualism at scale. We show
that PaLM is exposed to over 30 million translation pairs across at least 44
languages. Furthermore, the amount of incidental bilingual content is highly
correlated with the amount of monolingual in-language content for non-English
languages. We relate incidental bilingual content to zero-shot prompts and show
that it can be used to mine new prompts to improve PaLM's out-of-English
zero-shot translation quality. Finally, in a series of small-scale ablations,
we show that its presence has a substantial impact on translation capabilities,
although this impact diminishes with model scale.