**CharacterFlywheel：在生产环境中规模化迭代优化引人入胜且可控的大型语言模型**

摘要

本报告提出CharacterFlywheel——一种用于优化Instagram、WhatsApp和Messenger等生产环境社交聊天应用中大语言模型（LLMs）的迭代式飞轮流程。基于LLaMA 3.1起点，我们利用内部及外部真实用户流量数据，历经15代模型精炼。通过2024年7月至2025年4月的连续部署，控制组为期7天的A/B测试显示持续参与度提升：新部署的8个模型中7个实现相对于基准线的正向提升，最优模型在参与广度上提升达8.8%，参与深度提升达19.4%。在可控性方面也取得显著进展，指令遵循率从59.2%提升至84.8%，指令违反率从26.6%降至5.8%。我们详述了整合数据筛选、参与度指标评估与插值的奖励建模、监督微调（SFT）、强化学习（RL）以及离线在线评估的CharacterFlywheel流程，确保每个优化步骤的可靠进展。同时探讨了大规模生产环境中防止过拟合及应对动态挑战的方法。这些成果为服务数百万用户的社交应用中LLMs的科学严谨性及认知理解提供了重要推进。

English

This report presents CharacterFlywheel, an iterative flywheel process for improving large language models (LLMs) in production social chat applications across Instagram, WhatsApp, and Messenger. Starting from LLaMA 3.1, we refined models across 15 generations using data from both internal and external real-user traffic. Through continuous deployments from July 2024 to April 2025, we conducted controlled 7-day A/B tests showing consistent engagement improvements: 7 of 8 newly deployed models demonstrated positive lift over the baseline, with the strongest performers achieving up to 8.8% improvement in engagement breadth and 19.4% in engagement depth. We also observed substantial gains in steerability, with instruction following increasing from 59.2% to 84.8% and instruction violations decreasing from 26.6% to 5.8%. We detail the CharacterFlywheel process which integrates data curation, reward modeling to estimate and interpolate the landscape of engagement metrics, supervised fine-tuning (SFT), reinforcement learning (RL), and both offline and online evaluation to ensure reliable progress at each optimization step. We also discuss our methods for overfitting prevention and navigating production dynamics at scale. These contributions advance the scientific rigor and understanding of LLMs in social applications serving millions of users.

CharacterFlywheel：在生产环境中规模化迭代优化引人入胜且可控的大型语言模型

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

摘要

Support