MOA:角色扮演代理的多目标对齐方法
MOA: Multi-Objective Alignment for Role-Playing Agents
December 10, 2025
作者: Chonghua Liao, Ke Wang, Yuchuan Wu, Fei Huang, Yongbin Li
cs.AI
摘要
角色扮演智能体(RPAs)需同时掌握多项相互冲突的技能——遵循多轮指令、展现领域知识并保持统一的语言风格。现有研究要么依赖监督微调(SFT)方法导致过度拟合表面特征而降低输出多样性,要么采用强化学习(RL)策略难以实现多维度综合优化。我们提出多目标对齐框架(MOA),该强化学习框架通过多维度细粒度评估标准实现通用RPAs的优化。MOA引入创新的多目标优化策略,可同步训练多个细粒度评估维度以提升优化效果。此外,为解决模型输出多样性与质量的平衡问题,我们还采用了基于思维增强的离线策略引导机制。在PersonaGym和RoleMRC等挑战性基准测试上的实验表明,MOA能使80亿参数模型在多个维度上达到甚至超越GPT-4o和Claude等强基线模型,这证明了MOA在构建同时满足角色知识、人物风格、多样化场景和复杂多轮对话需求的RPAs方面具有巨大潜力。
English
Role-playing agents (RPAs) must simultaneously master many conflicting skills -- following multi-turn instructions, exhibiting domain knowledge, and adopting a consistent linguistic style. Existing work either relies on supervised fine-tuning (SFT) that over-fits surface cues and yields low diversity, or applies reinforcement learning (RL) that fails to learn multiple dimensions for comprehensive RPA optimization. We present MOA (Multi-Objective Alignment), a reinforcement-learning framework that enables multi-dimensional, fine-grained rubric optimization for general RPAs. MOA introduces a novel multi-objective optimization strategy that trains simultaneously on multiple fine-grained rubrics to boost optimization performance. Besides, to address the issues of model output diversity and quality, we have also employed thought-augmented rollout with off-policy guidance. Extensive experiments on challenging benchmarks such as PersonaGym and RoleMRC show that MOA enables an 8B model to match or even outperform strong baselines such as GPT-4o and Claude across numerous dimensions. This demonstrates the great potential of MOA in building RPAs that can simultaneously meet the demands of role knowledge, persona style, diverse scenarios, and complex multi-turn conversations.