基于量规的基准测试与强化学习在提升大模型指令遵循能力中的应用

摘要

大型語言模型（LLMs）的最新進展已在一系列任務上展現出卓越性能，然而高階指令跟隨（IF）能力——特別是針對複雜、多輪次及系統級提示的指令——仍是重大挑戰。由於缺乏高質量人工標註基準與可靠可解釋的獎勵信號，對此類能力進行嚴謹評估與有效訓練受到制約。本研究提出AdvancedIF（即將開源該基準），這是一個包含1,600餘個提示詞及專家設計評分標準的綜合基準，用於評估LLMs遵循複雜多輪系統級指令的能力。我們進一步提出RIFL（基於評分標準的指令跟隨學習），該創新後訓練流程通過生成評分標準、微調評分驗證器及獎勵塑形技術，實現高效的指令跟隨強化學習。大量實驗表明，RIFL顯著提升LLMs的指令跟隨能力，在AdvancedIF上實現6.7%的絕對性能提升，並在公開基準測試中表現優異。消融實驗證實了RIFL各組件的有效性。本研究確立了評分標準作為訓練與評估LLMs高階IF能力的有效工具，為構建更強大可靠的AI系統奠定基礎。

English

Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-prompted instructions-remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpretable reward signals. In this work, we introduce AdvancedIF (we will release this benchmark soon), a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs ability to follow complex, multi-turn, and system-level instructions. We further propose RIFL (Rubric-based Instruction-Following Learning), a novel post-training pipeline that leverages rubric generation, a finetuned rubric verifier, and reward shaping to enable effective reinforcement learning for instruction following. Extensive experiments demonstrate that RIFL substantially improves the instruction-following abilities of LLMs, achieving a 6.7% absolute gain on AdvancedIF and strong results on public benchmarks. Our ablation studies confirm the effectiveness of each component in RIFL. This work establishes rubrics as a powerful tool for both training and evaluating advanced IF in LLMs, paving the way for more capable and reliable AI systems.

基于量规的基准测试与强化学习在提升大模型指令遵循能力中的应用

Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

摘要

Support