大语言模型中的零-shot 跨语言转移的层交换

摘要

模型合并，如模型混合，是将具有相同架构的不同模型结合在一起而无需进一步训练的做法。在这项工作中，我们提出了一种模型合并方法，解决了在非英语语言中为目标任务微调大型语言模型（LLMs）的困难，其中任务特定数据通常不可用。我们专注于数学推理，在没有语言内数学数据的情况下，通过组合语言和数学能力促进跨语言转移。从相同的预训练模型开始，我们在英语数学指导数据和目标语言通用指导数据上分别对“专家”进行微调。然后，我们直接用语言专家的层替换数学专家的顶部和底部Transformer层，从而增强目标语言中的数学性能。合并后的模型在数学基准测试MGSM中表现优于单个专家和其他合并方法，跨四种主要语言，其中数学指导数据稀缺，性能提高了10%。此外，这种层交换简单、廉价且直观，因为它基于对每个专家微调过程中最重要参数变化的解释性分析。成功以这种方式重新组合LLMs以实现跨语言转移的能力，为将来结合模型专业知识、创建模块化解决方案以及跨语言传递推理能力打开了未来可能性。

English

Model merging, such as model souping, is the practice of combining different models with the same architecture together without further training. In this work, we present a model merging methodology that addresses the difficulty of fine-tuning Large Language Models (LLMs) for target tasks in non-English languages, where task-specific data is often unavailable. We focus on mathematical reasoning and without in-language math data, facilitate cross-lingual transfer by composing language and math capabilities. Starting from the same pretrained model, we fine-tune separate "experts" on math instruction data in English and on generic instruction data in the target language. We then replace the top and bottom transformer layers of the math expert directly with layers from the language expert, which consequently enhances math performance in the target language. The resulting merged models outperform the individual experts and other merging methods on the math benchmark, MGSM, by 10% across four major languages where math instruction data is scarce. In addition, this layer swapping is simple, inexpensive, and intuitive, as it is based on an interpretative analysis of the most important parameter changes during the fine-tuning of each expert. The ability to successfully re-compose LLMs for cross-lingual transfer in this manner opens up future possibilities to combine model expertise, create modular solutions, and transfer reasoning capabilities across languages all post hoc.

大语言模型中的零-shot 跨语言转移的层交换

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

摘要

Support