UltraIF：從野外推進指示遵循

摘要

指示遵循使現代大型語言模型（LLMs）成為有用的助手。然而，對於如何馴服LLMs以遵循複雜指示的關鍵仍然神秘，因為在由開源社區訓練的模型與由領先公司訓練的模型之間存在巨大差距。為了彌合這一差距，我們提出了一種簡單且可擴展的方法UltraIF，用於構建能夠使用開源數據遵循複雜指示的LLMs。UltraIF首先將現實世界用戶提示分解為更簡單的查詢、約束條件以及相應的約束條件評估問題。然後，我們訓練一個UltraComposer來組合與評估問題相關的提示。這種提示組合器使我們能夠綜合複雜的指示以及使用評估問題過濾回應。在我們的實驗中，我們首次成功將LLaMA-3.1-8B-Base對齊到其5個指示遵循基準版本，而無需任何基準信息，僅使用8B模型作為回應生成器和評估器。對齊的模型還在其他基準上取得了競爭得分。此外，我們還展示了UltraIF通過自對齊可以進一步改進LLaMA-3.1-8B-Instruct，激發了該方法更廣泛的應用案例。我們的代碼將在https://github.com/kkk-an/UltraIF 上提供。

English

Instruction-following made modern large language models (LLMs) helpful assistants. However, the key to taming LLMs on complex instructions remains mysterious, for that there are huge gaps between models trained by open-source community and those trained by leading companies. To bridge the gap, we propose a simple and scalable approach UltraIF for building LLMs that can follow complex instructions with open-source data. UltraIF first decomposes real-world user prompts into simpler queries, constraints, and corresponding evaluation questions for the constraints. Then, we train an UltraComposer to compose constraint-associated prompts with evaluation questions. This prompt composer allows us to synthesize complicated instructions as well as filter responses with evaluation questions. In our experiment, for the first time, we successfully align LLaMA-3.1-8B-Base to catch up with its instruct version on 5 instruction-following benchmarks without any benchmark information, using only 8B model as response generator and evaluator. The aligned model also achieved competitive scores on other benchmarks. Moreover, we also show that UltraIF could further improve LLaMA-3.1-8B-Instruct through self-alignment, motivating broader use cases for the method. Our code will be available at https://github.com/kkk-an/UltraIF.

UltraIF：從野外推進指示遵循

UltraIF: Advancing Instruction Following from the Wild

摘要

Support