ChatPaper.aiChatPaper

RedOne 2.0:重新思考社交網路服務中領域特定大型語言模型的後訓練策略

RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

November 10, 2025
作者: Fei Zhao, Chonggang Lu, Haofu Qian, Fangcheng Shi, Zijie Meng, Jianzhao Huang, Xu Tang, Zheyong Xie, Zheyu Ye, Zhe Xu, Yao Hu, Shaosheng Cao
cs.AI

摘要

作為人類互動與資訊交流的關鍵媒介,社交網路服務對大型語言模型提出了獨特挑戰:異質性工作負載、快速更迭的網路規範與俚語,以及引發劇烈分佈遷移的多語言、文化多樣性語料庫。監督式微調雖能定制模型,但常引發域內增益與域外魯棒性之間的「蹺蹺板效應」,對小型模型尤為明顯。為應對這些挑戰,我們推出RedOne 2.0——採用漸進式強化學習優先的後訓練範式所訓練的社交網路導向大模型,專為快速穩定適應而設計。該流程包含三階段:(1) 基於精選社交網路語料庫的探索性學習,建立初步對齊並識別系統性弱點;(2) 針對性微調,選擇性地對診斷出的缺陷實施監督式微調,同時混入少量通用數據以緩解遺忘現象;(3) 精細化學習,重新應用以社交網路為核心信號的強化學習,鞏固改進成果並協調跨任務的權衡。在涵蓋三大類別的多元任務中,我們的40億參數模型相較70億參數次優基準模型實現平均約2.41分的性能提升。此外,RedOne 2.0僅需不到RedOne(以監督式微調為核心的方法)一半的數據量,即從基礎模型獲得平均約8.74分的性能增幅,展現出在緊湊規模下卓越的數據效率與穩定性。總體而言,RedOne 2.0為社交網路場景下的領域專用大模型建立了具競爭力且成本效益優異的基準,在保持魯棒性的同時推動能力邊界。
English
As a key medium for human interaction and information exchange, social networking services (SNS) pose unique challenges for large language models (LLMs): heterogeneous workloads, fast-shifting norms and slang, and multilingual, culturally diverse corpora that induce sharp distribution shift. Supervised fine-tuning (SFT) can specialize models but often triggers a ``seesaw'' between in-distribution gains and out-of-distribution robustness, especially for smaller models. To address these challenges, we introduce RedOne 2.0, an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm designed for rapid and stable adaptation. The pipeline consist in three stages: (1) Exploratory Learning on curated SNS corpora to establish initial alignment and identify systematic weaknesses; (2) Targeted Fine-Tuning that selectively applies SFT to the diagnosed gaps while mixing a small fraction of general data to mitigate forgetting; and (3) Refinement Learning that re-applies RL with SNS-centric signals to consolidate improvements and harmonize trade-offs across tasks. Across various tasks spanning three categories, our 4B scale model delivers an average improvements about 2.41 over the 7B sub-optimal baseline. Additionally, RedOne 2.0 achieves average performance lift about 8.74 from the base model with less than half the data required by SFT-centric method RedOne, evidencing superior data efficiency and stability at compact scales. Overall, RedOne 2.0 establishes a competitive, cost-effective baseline for domain-specific LLMs in SNS scenario, advancing capability without sacrificing robustness.
PDF182December 2, 2025