踩踏調整:通過引導式增強實現LLM的自對齊規模化
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping
February 12, 2024
作者: Haoyu Wang, Guozheng Ma, Ziqiao Meng, Zeyu Qin, Li Shen, Zhong Zhang, Bingzhe Wu, Liu Liu, Yatao Bian, Tingyang Xu, Xueqian Wang, Peilin Zhao
cs.AI
摘要
自我對齊是降低人工標註成本並確保模型能力的有效方法。然而,大多數當前方法在單輪中完成數據收集和訓練步驟,可能忽略了自我對齊模型不斷提升的能力。這帶來一個關鍵問題:如果我們進行多次引導自我對齊,這樣的策略是否能提升模型性能或導致快速退化?在本文中,我們的開拓性探索深入研究了引導自我對齊對大型語言模型的影響。我們的研究結果顯示,通過確保從上下文學習中獲得數據多樣性,引導自我對齊明顯優於單輪方法。為了進一步發揮引導的能力,我們研究並調整了數據的訓練順序,從而提高了模型的性能。基於這些發現,我們提出了Step-On-Feet Tuning(SOFT),利用模型持續增強的少樣本能力來提升零樣本或一樣本的性能。基於由易到難的訓練配方,我們提出了SOFT+,進一步提升了自我對齊的性能。我們的實驗證明了SOFT(SOFT+)在各種分類和生成任務中的效率,突出了引導自我對齊在持續增強模型對齊性能方面的潛力。
English
Self-alignment is an effective way to reduce the cost of human annotation
while ensuring promising model capability. However, most current methods
complete the data collection and training steps in a single round, which may
overlook the continuously improving ability of self-aligned models. This gives
rise to a key query: What if we do multi-time bootstrapping self-alignment?
Does this strategy enhance model performance or lead to rapid degradation? In
this paper, our pioneering exploration delves into the impact of bootstrapping
self-alignment on large language models. Our findings reveal that bootstrapping
self-alignment markedly surpasses the single-round approach, by guaranteeing
data diversity from in-context learning. To further exploit the capabilities of
bootstrapping, we investigate and adjust the training order of data, which
yields improved performance of the model. Drawing on these findings, we propose
Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced
few-shot ability to boost zero or one-shot performance. Based on easy-to-hard
training recipe, we propose SOFT+ which further boost self-alignment's
performance. Our experiments demonstrate the efficiency of SOFT (SOFT+) across
various classification and generation tasks, highlighting the potential of
bootstrapping self-alignment on continually enhancing model alignment
performance.