ChatPaper.aiChatPaper

CrowdSelect:基於多LLM智慧的合成指令數據選擇

CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom

March 3, 2025
作者: Yisen Li, Lingfeng Yang, Wenxuan Shen, Pan Zhou, Yao Wan, Weiwei Lin, Dongping Chen
cs.AI

摘要

將先進大型語言模型的指令遵循能力通過精選子集蒸餾至較小模型,已成為模型訓練的主流方法。現有的合成指令數據選擇策略主要依賴單一維度信號(即獎勵分數、模型困惑度),未能捕捉跨領域指令遵循的複雜性。因此,我們探索更為多樣的信號以全面捕獲指令-響應對的特徵,並提出了三種基礎指標,這些指標利用多LLM智慧,基於(1)多樣LLM響應和(2)獎勵模型評估。在基礎指標之上,我們提出了CrowdSelect,這是一種集成指標,結合了基於聚類的方法以保持響應多樣性。我們的全面實驗表明,這些基礎指標在MT-bench和Arena-Hard上的四個基礎模型上均能持續提升性能。CrowdSelect高效整合了所有指標,在Full和LoRA微調中均達到了最先進的性能,在Llama-3.2-3b-instruct模型上,Arena-Hard提升了4.81%,MT-bench提升了11.1%。我們希望這些發現能為未來相關研究提供寶貴的見解。代碼已開源於https://github.com/listentm/crowdselect。
English
Distilling advanced Large Language Models' instruction-following capabilities into smaller models using a selected subset has become a mainstream approach in model training. While existing synthetic instruction data selection strategies rely mainly on single-dimensional signals (i.e., reward scores, model perplexity), they fail to capture the complexity of instruction-following across diverse fields. Therefore, we investigate more diverse signals to capture comprehensive instruction-response pair characteristics and propose three foundational metrics that leverage Multi-LLM wisdom, informed by (1) diverse LLM responses and (2) reward model assessment. Building upon base metrics, we propose CrowdSelect, an integrated metric incorporating a clustering-based approach to maintain response diversity. Our comprehensive experiments demonstrate that our foundation metrics consistently improve performance across 4 base models on MT-bench and Arena-Hard. CrowdSelect, efficiently incorporating all metrics, achieves state-of-the-art performance in both Full and LoRA fine-tuning, showing improvements of 4.81% on Arena-Hard and 11.1% on MT-bench with Llama-3.2-3b-instruct. We hope our findings will bring valuable insights for future research in this direction. Code are available at https://github.com/listentm/crowdselect.

Summary

AI-Generated Summary

PDF135March 6, 2025