重复利用您的奖励:零-shot 跨语言对齐的奖励模型迁移
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
April 18, 2024
作者: Zhaofeng Wu, Ananth Balashankar, Yoon Kim, Jacob Eisenstein, Ahmad Beirami
cs.AI
摘要
基于人类注释的偏好数据对齐语言模型(LMs)是获得实用且高性能的基于LM的系统的关键步骤。然而,多语言人类偏好数据很难大规模获取,这使得将这一框架扩展到不同语言变得具有挑战性。在这项工作中,我们评估了一种简单的零翻译跨语言对齐方法,其中奖励模型基于一个源语言的偏好数据进行训练,然后直接应用于其他目标语言。在总结和开放式对话生成方面,我们展示了这种方法在全面评估设置下的持续成功性,包括人类评估:跨语言对齐模型在多达70%的评估实例上优于未对齐模型。此外,我们发现,有时不同语言的奖励模型比相同语言的奖励模型能够产生更好的对齐模型。我们还确定了在没有语言特定数据甚至进行监督微调时的最佳实践,这也是对齐中的另一个组成部分。
English
Aligning language models (LMs) based on human-annotated preference data is a
crucial step in obtaining practical and performant LM-based systems. However,
multilingual human preference data are difficult to obtain at scale, making it
challenging to extend this framework to diverse languages. In this work, we
evaluate a simple approach for zero-shot cross-lingual alignment, where a
reward model is trained on preference data in one source language and directly
applied to other target languages. On summarization and open-ended dialog
generation, we show that this method is consistently successful under
comprehensive evaluation settings, including human evaluation: cross-lingually
aligned models are preferred by humans over unaligned models on up to >70% of
evaluation instances. We moreover find that a different-language reward model
sometimes yields better aligned models than a same-language reward model. We
also identify best practices when there is no language-specific data for even
supervised finetuning, another component in alignment.Summary
AI-Generated Summary