角色飞轮：规模化迭代优化生产环境中引人入胜且可控的大型语言模型

摘要

本报告提出CharacterFlywheel——一种用于优化社交聊天应用中大语言模型（LLM）的迭代式飞轮流程，该流程已应用于Instagram、WhatsApp和Messenger三大平台。基于LLaMA 3.1起点，我们利用内外部真实用户流量数据完成了15代模型迭代。通过2024年7月至2025年4月期间的持续部署，为期7天的对照A/B测试显示参与度持续提升：新部署的8个模型中有7个优于基线，最优模型在参与广度上提升达8.8%，参与深度提升达19.4%。在指令遵循方面也取得显著进展，遵循率从59.2%升至84.8%，指令违规率从26.6%降至5.8%。我们详细阐述了整合数据筛选、参与度指标评估与插值的奖励建模、监督微调（SFT）、强化学习（RL）以及离线在线评估的CharacterFlywheel流程，确保每步优化可靠推进。同时探讨了预防过拟合与大规模生产环境动态调控的方法。这些成果为服务数百万用户的社交应用中LLM的科学化研究与实践提供了重要参考。

English

This report presents CharacterFlywheel, an iterative flywheel process for improving large language models (LLMs) in production social chat applications across Instagram, WhatsApp, and Messenger. Starting from LLaMA 3.1, we refined models across 15 generations using data from both internal and external real-user traffic. Through continuous deployments from July 2024 to April 2025, we conducted controlled 7-day A/B tests showing consistent engagement improvements: 7 of 8 newly deployed models demonstrated positive lift over the baseline, with the strongest performers achieving up to 8.8% improvement in engagement breadth and 19.4% in engagement depth. We also observed substantial gains in steerability, with instruction following increasing from 59.2% to 84.8% and instruction violations decreasing from 26.6% to 5.8%. We detail the CharacterFlywheel process which integrates data curation, reward modeling to estimate and interpolate the landscape of engagement metrics, supervised fine-tuning (SFT), reinforcement learning (RL), and both offline and online evaluation to ensure reliable progress at each optimization step. We also discuss our methods for overfitting prevention and navigating production dynamics at scale. These contributions advance the scientific rigor and understanding of LLMs in social applications serving millions of users.

角色飞轮：规模化迭代优化生产环境中引人入胜且可控的大型语言模型

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

摘要

Support