I-SHEEP：通過迭代自我增強範式從頭開始自我對齊LLM

摘要

大型語言模型（LLMs）已取得重大進展，然而，常見的學習範式將LLMs視為被動的信息存儲庫，忽略了它們在主動學習和調整方面的潛力。一些方法利用LLMs生成的合成數據來訓練它們，探索主動調整的可能性。然而，這些一次性調整方法與人類的持續自動調整之間仍存在巨大差距。在本文中，我們介紹了I-SHEEP，一種迭代式自我增強範式。這種類似人類的範式使LLMs能夠不斷地從零開始自我調整。與本文中首次提到的一次性調整方法Dromedary sun2023principledriven相比，I-SHEEP在Qwen和Llama模型的能力上都有顯著提升。在Qwen-1.5 72B模型的後續迭代中，I-SHEEP在Alpaca Eval中實現了最大相對改進78.2％，在MT Bench中為24.0％，在IFEval準確性方面絕對提高了8.88％。此外，I-SHEEP在各種標準基準生成任務中超越了基礎模型，在代碼生成任務中平均提高了24.77％，在TrivialQA中提高了12.04％，在SQuAD中提高了20.29％。我們還根據實驗結果提供了新的見解。我們的代碼、數據集和模型可在https://anonymous.4open.science/r/I-SHEEP 上獲得。

English

Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment. Some approaches train LLMs using their own generated synthetic data, exploring the possibility of active alignment. However, there is still a huge gap between these one-time alignment methods and the continuous automatic alignment of humans. In this paper, we introduce I-SHEEP, an Iterative Self-EnHancEmEnt Paradigm.This human-like paradigm enables LLMs to continuously self-align from scratch with nothing. Compared to the one-time alignment method Dromedary sun2023principledriven, which refers to the first iteration in this paper, I-SHEEP can significantly enhance capacities on both Qwen and Llama models. I-SHEEP achieves a maximum relative improvement of 78.2\% in the Alpaca Eval, 24.0\% in the MT Bench, and an absolute increase of 8.88\% in the IFEval accuracy over subsequent iterations in Qwen-1.5 72B model. Additionally, I-SHEEP surpasses the base model in various standard benchmark generation tasks, achieving an average improvement of 24.77\% in code generation tasks, 12.04\% in TrivialQA, and 20.29\% in SQuAD. We also provide new insights based on the experiment results. Our codes, datasets, and models are available at https://anonymous.4open.science/r/I-SHEEP.

I-SHEEP：通過迭代自我增強範式從頭開始自我對齊LLM

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

摘要

Support