ChatPaper.aiChatPaper

AgentInstruct:朝向具有主動流的生成式教學

AgentInstruct: Toward Generative Teaching with Agentic Flows

July 3, 2024
作者: Arindam Mitra, Luciano Del Corro, Guoqing Zheng, Shweti Mahajan, Dany Rouhana, Andres Codas, Yadong Lu, Wei-ge Chen, Olga Vrousgos, Corby Rosset, Fillipe Silva, Hamed Khanpour, Yash Lara, Ahmed Awadallah
cs.AI

摘要

合成數據對於加速語言模型的開發變得日益重要,不論是大型還是小型模型。儘管有幾個成功的應用案例,研究人員也提出了關於模型崩潰和模仿其他模型的缺點的擔憂。這種差異可以歸因於合成數據在質量和多樣性上存在差異。有效利用合成數據通常需要大量人力來精心挑選數據。我們專注於將合成數據用於後訓練,具體來說是通過強大模型創建數據,來教授另一個模型新技能或行為,我們將這種情境稱為生成式教學。我們介紹了AgentInstruct,一個可擴展的主動框架,用於自動創建大量多樣且高質量的合成數據。AgentInstruct可以創建提示和回應,僅使用原始數據源,如文本文檔和代碼文件作為種子。我們通過創建一個後訓練數據集,包含2500萬對,來展示AgentInstruct的實用性,以教導語言模型不同的技能,如文本編輯、創意寫作、工具使用、編碼、閱讀理解等。該數據集可用於任何基礎模型的指導調整。我們使用這些數據對Mistral-7b進行後訓練。當將結果模型Orca-3與Mistral-7b-Instruct(使用相同基礎模型)進行比較時,我們觀察到在許多基準測試中取得了顯著的改善。例如,在AGIEval上有40%的改善,在MMLU上有19%的改善,在GSM8K上有54%的改善,在BBH上有38%的改善,在AlpacaEval上有45%的改善。此外,它在一致性上表現優於其他模型,如LLAMA-8B-instruct和GPT-3.5-turbo。
English
Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually requires significant human effort in curating the data. We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model, we refer to this setting as Generative Teaching. We introduce AgentInstruct, an extensible agentic framework for automatically creating large amounts of diverse and high-quality synthetic data. AgentInstruct can create both the prompts and responses, using only raw data sources like text documents and code files as seeds. We demonstrate the utility of AgentInstruct by creating a post training dataset of 25M pairs to teach language models different skills, such as text editing, creative writing, tool usage, coding, reading comprehension, etc. The dataset can be used for instruction tuning of any base model. We post-train Mistral-7b with the data. When comparing the resulting model Orca-3 to Mistral-7b-Instruct (which uses the same base model), we observe significant improvements across many benchmarks. For example, 40% improvement on AGIEval, 19% improvement on MMLU, 54% improvement on GSM8K, 38% improvement on BBH and 45% improvement on AlpacaEval. Additionally, it consistently outperforms other models such as LLAMA-8B-instruct and GPT-3.5-turbo.

Summary

AI-Generated Summary

PDF5115November 28, 2024