Magpie：通过提示对齐的LLMs从零开始进行对齐数据综合

摘要

高质量的指导数据对齐大型语言模型（LLMs）至关重要。尽管一些模型，如Llama-3-Instruct，具有开放权重，但它们的对齐数据仍然保持私密，这阻碍了人工智能的民主化。高昂的人力成本和有限的、预定义的提示范围阻碍了现有开源数据创建方法的有效扩展，可能限制了公共对齐数据集的多样性和质量。通过直接从对齐的LLM中提取，能否合成大规模高质量的指导数据？我们提出了一种用于生成大规模对齐数据的自我合成方法，称为Magpie。我们的关键观察是，像Llama-3-Instruct这样的对齐LLMs可以在我们仅输入左侧模板直到保留给用户消息位置时生成用户查询，这要归功于它们的自回归特性。我们利用这种方法提示Llama-3-Instruct，并生成了400万条指导以及它们对应的响应。我们对提取的数据进行了全面分析，并选择了30万个高质量实例。为了将Magpie数据与其他公共指导数据集进行比较，我们使用每个数据集对Llama-3-8B-Base进行微调，并评估微调模型的性能。我们的结果表明，在某些任务中，使用Magpie微调的模型的性能与官方Llama-3-8B-Instruct相当，尽管后者通过受监督微调（SFT）和随后的反馈学习增强了1000万数据点。我们还表明，仅使用Magpie进行SFT可以超越以往用于SFT和偏好优化的公共数据集的性能，例如使用UltraFeedback进行直接偏好优化。这种优势在对齐基准测试中明显，如AlpacaEval、ArenaHard和WildBench。

English

High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.

Magpie：通过提示对齐的LLMs从零开始进行对齐数据综合

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

摘要

Support