UltraIF:從野外推進指示遵循
UltraIF: Advancing Instruction Following from the Wild
February 6, 2025
作者: Kaikai An, Li Sheng, Ganqu Cui, Shuzheng Si, Ning Ding, Yu Cheng, Baobao Chang
cs.AI
摘要
指示遵循使現代大型語言模型(LLMs)成為有用的助手。然而,對於如何馴服LLMs以遵循複雜指示的關鍵仍然神秘,因為在由開源社區訓練的模型與由領先公司訓練的模型之間存在巨大差距。為了彌合這一差距,我們提出了一種簡單且可擴展的方法UltraIF,用於構建能夠使用開源數據遵循複雜指示的LLMs。UltraIF首先將現實世界用戶提示分解為更簡單的查詢、約束條件以及相應的約束條件評估問題。然後,我們訓練一個UltraComposer來組合與評估問題相關的提示。這種提示組合器使我們能夠綜合複雜的指示以及使用評估問題過濾回應。在我們的實驗中,我們首次成功將LLaMA-3.1-8B-Base對齊到其5個指示遵循基準版本,而無需任何基準信息,僅使用8B模型作為回應生成器和評估器。對齊的模型還在其他基準上取得了競爭得分。此外,我們還展示了UltraIF通過自對齊可以進一步改進LLaMA-3.1-8B-Instruct,激發了該方法更廣泛的應用案例。我們的代碼將在https://github.com/kkk-an/UltraIF 上提供。
English
Instruction-following made modern large language models (LLMs) helpful
assistants. However, the key to taming LLMs on complex instructions remains
mysterious, for that there are huge gaps between models trained by open-source
community and those trained by leading companies. To bridge the gap, we propose
a simple and scalable approach UltraIF for building LLMs that can follow
complex instructions with open-source data. UltraIF first decomposes real-world
user prompts into simpler queries, constraints, and corresponding evaluation
questions for the constraints. Then, we train an UltraComposer to compose
constraint-associated prompts with evaluation questions. This prompt composer
allows us to synthesize complicated instructions as well as filter responses
with evaluation questions. In our experiment, for the first time, we
successfully align LLaMA-3.1-8B-Base to catch up with its instruct version on 5
instruction-following benchmarks without any benchmark information, using only
8B model as response generator and evaluator. The aligned model also achieved
competitive scores on other benchmarks. Moreover, we also show that UltraIF
could further improve LLaMA-3.1-8B-Instruct through self-alignment, motivating
broader use cases for the method. Our code will be available at
https://github.com/kkk-an/UltraIF.Summary
AI-Generated Summary