AlpaGasus：以更少的數據訓練更好的羊駝

摘要

大型語言模型（LLMs）通過在監督指令/回應數據上進行指令微調（IFT）來獲得遵循指令的能力。然而，廣泛使用的IFT數據集（例如Alpaca的52k數據）驚人地包含許多質量低劣的實例，其回應不正確或無關，這些對IFT具有誤導性和有害性。在本文中，我們提出了一種簡單而有效的數據選擇策略，通過使用強大的LLM（例如ChatGPT）自動識別並刪除低質量數據。為此，我們引入了AlpaGasus，它僅在從52k Alpaca數據中篩選出的9k高質量數據上進行微調。AlpaGasus在多個測試集上明顯優於原始Alpaca，經GPT-4評估，其13B變體在測試任務上的性能與其教師LLM（即Text-Davinci-003）的性能匹配超過90％。它還提供了5.7倍更快的訓練速度，將7B變體的訓練時間從80分鐘（對於Alpaca）降至14分鐘。我們應用IFT進行相同數量的時代，如Alpaca（7B），但在更少的數據上，使用4倍NVIDIA A100（80GB）GPU，並遵循原始Alpaca設置和超參數。總的來說，AlpaGasus展示了一種新穎的以數據為中心的IFT範式，可以廣泛應用於指令微調數據，實現更快的訓練和更好的遵循指令模型。我們的項目頁面可在以下網址找到：https://lichang-chen.github.io/AlpaGasus/。

English

Large language models~(LLMs) obtain instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and removes low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and its 13B variant matches >90% performance of its teacher LLM (i.e., Text-Davinci-003) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes We apply IFT for the same number of epochs as Alpaca(7B) but on fewer data, using 4timesNVIDIA A100 (80GB) GPUs and following the original Alpaca setting and hyperparameters.. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: https://lichang-chen.github.io/AlpaGasus/.

AlpaGasus：以更少的數據訓練更好的羊駝

AlpaGasus: Training A Better Alpaca with Fewer Data

摘要

Support