I-SHEEP：通过迭代自我增强范式从头开始自我对齐LLM

摘要

大型语言模型（LLMs）取得了显著进展，然而，常见的学习范式将LLMs视为被动信息存储库，忽视了它们在主动学习和对齐方面的潜力。一些方法使用LLMs生成的合成数据来训练模型，探索主动对齐的可能性。然而，这些一次性对齐方法与人类的持续自动对齐之间仍存在巨大差距。在本文中，我们介绍了I-SHEEP，一种迭代自我增强范式。这种类似人类的范式使LLMs能够从零开始持续自我对齐。与本文中首次迭代的一次性对齐方法Dromedary sun2023principledriven相比，I-SHEEP可以显著增强Qwen和Llama模型的能力。在Qwen-1.5 72B模型的后续迭代中，I-SHEEP在Alpaca Eval中实现了最大相对改进78.2％，在MT Bench中为24.0％，在IFEval准确性上绝对增加了8.88％。此外，I-SHEEP在各种标准基准生成任务中超越了基准模型，在代码生成任务中平均提高了24.77％，在TrivialQA中为12.04％，在SQuAD中为20.29％。我们还根据实验结果提供了新的见解。我们的代码、数据集和模型可在https://anonymous.4open.science/r/I-SHEEP 上获取。

English

Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment. Some approaches train LLMs using their own generated synthetic data, exploring the possibility of active alignment. However, there is still a huge gap between these one-time alignment methods and the continuous automatic alignment of humans. In this paper, we introduce I-SHEEP, an Iterative Self-EnHancEmEnt Paradigm.This human-like paradigm enables LLMs to continuously self-align from scratch with nothing. Compared to the one-time alignment method Dromedary sun2023principledriven, which refers to the first iteration in this paper, I-SHEEP can significantly enhance capacities on both Qwen and Llama models. I-SHEEP achieves a maximum relative improvement of 78.2\% in the Alpaca Eval, 24.0\% in the MT Bench, and an absolute increase of 8.88\% in the IFEval accuracy over subsequent iterations in Qwen-1.5 72B model. Additionally, I-SHEEP surpasses the base model in various standard benchmark generation tasks, achieving an average improvement of 24.77\% in code generation tasks, 12.04\% in TrivialQA, and 20.29\% in SQuAD. We also provide new insights based on the experiment results. Our codes, datasets, and models are available at https://anonymous.4open.science/r/I-SHEEP.

I-SHEEP：通过迭代自我增强范式从头开始自我对齐LLM

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

摘要

Support