ChatPaper.aiChatPaper

Qwen3Guard技术报告

Qwen3Guard Technical Report

October 16, 2025
作者: Haiquan Zhao, Chenhan Yuan, Fei Huang, Xiaomeng Hu, Yichang Zhang, An Yang, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin, Baosong Yang, Chen Cheng, Jialong Tang, Jiandong Jiang, Jianwei Zhang, Jijie Xu, Ming Yan, Minmin Sun, Pei Zhang, Pengjun Xie, Qiaoyu Tang, Qin Zhu, Rong Zhang, Shibin Wu, Shuo Zhang, Tao He, Tianyi Tang, Tingyu Xia, Wei Liao, Weizhou Shen, Wenbiao Yin, Wenmeng Zhou, Wenyuan Yu, Xiaobin Wang, Xiaodong Deng, Xiaodong Xu, Xinyu Zhang, Yang Liu, Yeqiu Li, Yi Zhang, Yong Jiang, Yu Wan, Yuxin Zhou
cs.AI

摘要

随着大型语言模型(LLMs)能力的提升和广泛应用,确保其输出安全性变得愈发关键。现有的防护模型虽然在静态评估场景中颇具价值,但在实际应用中面临两大局限:(1)它们通常仅输出二元的“安全/不安全”标签,这些标签在不同安全政策下可能被不一致地解读,导致无法适应各领域间差异化的安全容忍度;(2)它们需在模型完整输出后方能执行安全检查,这使其本质上与流式LLM推理不兼容,从而阻碍了生成过程中的及时干预,并增加了有害部分输出的暴露风险。为应对这些挑战,我们推出了Qwen3Guard系列多语言安全防护模型,包含两种专门变体:生成式Qwen3Guard,它将安全分类转化为指令跟随任务,以实现细粒度的三分类判断(安全、争议、不安全);以及流式Qwen3Guard,它引入了令牌级分类头,用于增量文本生成过程中的实时安全监控。两种变体均提供三种规模(0.6B、4B和8B参数),并支持多达119种语言和方言,为全球LLM部署提供全面、可扩展且低延迟的安全审核。在英语、中文及多语言基准测试中,Qwen3Guard在提示和响应安全分类上均达到了业界领先水平。所有模型均以Apache 2.0许可证发布,供公众使用。
English
As large language models (LLMs) become more capable and widely used, ensuring the safety of their outputs is increasingly critical. Existing guardrail models, though useful in static evaluation settings, face two major limitations in real-world applications: (1) they typically output only binary "safe/unsafe" labels, which can be interpreted inconsistently across diverse safety policies, rendering them incapable of accommodating varying safety tolerances across domains; and (2) they require complete model outputs before performing safety checks, making them fundamentally incompatible with streaming LLM inference, thereby preventing timely intervention during generation and increasing exposure to harmful partial outputs. To address these challenges, we present Qwen3Guard, a series of multilingual safety guardrail models with two specialized variants: Generative Qwen3Guard, which casts safety classification as an instruction-following task to enable fine-grained tri-class judgments (safe, controversial, unsafe); and Stream Qwen3Guard, which introduces a token-level classification head for real-time safety monitoring during incremental text generation. Both variants are available in three sizes (0.6B, 4B, and 8B parameters) and support up to 119 languages and dialects, providing comprehensive, scalable, and low-latency safety moderation for global LLM deployments. Evaluated across English, Chinese, and multilingual benchmarks, Qwen3Guard achieves state-of-the-art performance in both prompt and response safety classification. All models are released under the Apache 2.0 license for public use.
PDF122October 17, 2025