基于量规的基准测试与强化学习在提升大语言模型指令遵循能力中的应用
Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following
November 13, 2025
作者: Yun He, Wenzhe Li, Hejia Zhang, Songlin Li, Karishma Mandyam, Sopan Khosla, Yuanhao Xiong, Nanshu Wang, Selina Peng, Beibin Li, Shengjie Bi, Shishir G. Patil, Qi Qi, Shengyu Feng, Julian Katz-Samuels, Richard Yuanzhe Pang, Sujan Gonugondla, Hunter Lang, Yue Yu, Yundi Qian, Maryam Fazel-Zarandi, Licheng Yu, Amine Benhalloum, Hany Awadalla, Manaal Faruqui
cs.AI
摘要
近年来,大语言模型(LLM)在一系列任务上取得了显著进展,然而在高级指令遵循(IF)能力——尤其是针对复杂、多轮次及系统级指令的遵循方面——仍存在巨大挑战。由于缺乏高质量的人工标注基准和可靠可解释的奖励信号,针对此类能力的严格评估与有效训练受到制约。本研究提出AdvancedIF基准(即将开源),该基准包含1,600余个提示词及专家设计的评估标准,系统评估LLM遵循复杂多轮系统指令的能力。我们进一步提出RIFL(基于量规的指令遵循学习)方法,通过量规生成、微调的量规验证器和奖励塑造构建新型后训练流程,实现高效的指令遵循强化学习。大量实验表明,RIFL显著提升了LLM的指令遵循能力,在AdvancedIF基准上实现6.7%的绝对性能提升,并在公开基准上表现优异。消融实验验证了RIFL各核心模块的有效性。本研究确立了量规机制作为LLM高级指令遵循能力训练与评估的有效工具,为构建更强健可靠的人工智能系统开辟了新路径。
English
Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-prompted instructions-remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpretable reward signals. In this work, we introduce AdvancedIF (we will release this benchmark soon), a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs ability to follow complex, multi-turn, and system-level instructions. We further propose RIFL (Rubric-based Instruction-Following Learning), a novel post-training pipeline that leverages rubric generation, a finetuned rubric verifier, and reward shaping to enable effective reinforcement learning for instruction following. Extensive experiments demonstrate that RIFL substantially improves the instruction-following abilities of LLMs, achieving a 6.7% absolute gain on AdvancedIF and strong results on public benchmarks. Our ablation studies confirm the effectiveness of each component in RIFL. This work establishes rubrics as a powerful tool for both training and evaluating advanced IF in LLMs, paving the way for more capable and reliable AI systems.