ChatPaper.aiChatPaper

RedOne 2.0:社交网络服务中领域特定大语言模型后训练机制的再思考

RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

November 10, 2025
作者: Fei Zhao, Chonggang Lu, Haofu Qian, Fangcheng Shi, Zijie Meng, Jianzhao Huang, Xu Tang, Zheyong Xie, Zheyu Ye, Zhe Xu, Yao Hu, Shaosheng Cao
cs.AI

摘要

作为人类互动与信息交换的关键媒介,社交网络服务(SNS)对大型语言模型(LLMs)提出了独特挑战:异构工作负载、快速演变的网络用语与俚语,以及引发显著分布偏移的多语言、多文化语料库。监督微调(SFT)虽能实现模型专业化,但常引发分布内性能增益与分布外鲁棒性之间的"跷跷板效应",对轻量化模型尤为明显。为解决这些问题,我们推出RedOne 2.0——采用渐进式强化学习优先后训练范式开发的SNS导向型LLM,专为快速稳定适配而设计。该流程包含三个阶段:(1)基于精选SNS语料的探索性学习,建立初步对齐并识别系统性弱点;(2)针对性微调,对诊断出的能力缺口选择性应用SFT,同时混入少量通用数据以缓解遗忘;(3)精炼学习,重新应用以SNS为核心的强化学习信号,巩固改进效果并协调多任务间的权衡。在涵盖三大类任务的测试中,我们的40亿参数模型相较70亿参数次优基线平均提升2.41个指标点。此外,RedOne 2.0仅需不到原SFT核心方法RedOne一半的数据量,即实现相对基础模型平均8.74的性能提升,展现出轻量化规模下卓越的数据效率与稳定性。总体而言,RedOne 2.0为SNS场景的领域专用LLM建立了具有竞争力的成本效益基准,在提升能力的同时未牺牲鲁棒性。
English
As a key medium for human interaction and information exchange, social networking services (SNS) pose unique challenges for large language models (LLMs): heterogeneous workloads, fast-shifting norms and slang, and multilingual, culturally diverse corpora that induce sharp distribution shift. Supervised fine-tuning (SFT) can specialize models but often triggers a ``seesaw'' between in-distribution gains and out-of-distribution robustness, especially for smaller models. To address these challenges, we introduce RedOne 2.0, an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm designed for rapid and stable adaptation. The pipeline consist in three stages: (1) Exploratory Learning on curated SNS corpora to establish initial alignment and identify systematic weaknesses; (2) Targeted Fine-Tuning that selectively applies SFT to the diagnosed gaps while mixing a small fraction of general data to mitigate forgetting; and (3) Refinement Learning that re-applies RL with SNS-centric signals to consolidate improvements and harmonize trade-offs across tasks. Across various tasks spanning three categories, our 4B scale model delivers an average improvements about 2.41 over the 7B sub-optimal baseline. Additionally, RedOne 2.0 achieves average performance lift about 8.74 from the base model with less than half the data required by SFT-centric method RedOne, evidencing superior data efficiency and stability at compact scales. Overall, RedOne 2.0 establishes a competitive, cost-effective baseline for domain-specific LLMs in SNS scenario, advancing capability without sacrificing robustness.
PDF182December 2, 2025