Qwen3Guard技術報告
Qwen3Guard Technical Report
October 16, 2025
作者: Haiquan Zhao, Chenhan Yuan, Fei Huang, Xiaomeng Hu, Yichang Zhang, An Yang, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin, Baosong Yang, Chen Cheng, Jialong Tang, Jiandong Jiang, Jianwei Zhang, Jijie Xu, Ming Yan, Minmin Sun, Pei Zhang, Pengjun Xie, Qiaoyu Tang, Qin Zhu, Rong Zhang, Shibin Wu, Shuo Zhang, Tao He, Tianyi Tang, Tingyu Xia, Wei Liao, Weizhou Shen, Wenbiao Yin, Wenmeng Zhou, Wenyuan Yu, Xiaobin Wang, Xiaodong Deng, Xiaodong Xu, Xinyu Zhang, Yang Liu, Yeqiu Li, Yi Zhang, Yong Jiang, Yu Wan, Yuxin Zhou
cs.AI
摘要
随着大型语言模型(LLMs)能力的增强与广泛应用,确保其输出内容的安全性变得日益重要。现有的防护模型虽然在静态评估场景中颇具效用,但在实际应用中却面临两大局限:(1)它们通常仅输出二元的“安全/不安全”标签,这些标签在不同安全策略下可能被不一致地解读,导致无法适应各领域间安全容忍度的差异;(2)它们需在获取完整模型输出后方能执行安全检查,这从根本上与流式LLM推理不兼容,从而阻碍了生成过程中的及时干预,增加了有害部分输出的暴露风险。为应对这些挑战,我们推出了Qwen3Guard系列多语言安全防护模型,包含两种专门变体:生成式Qwen3Guard,它将安全分类转化为指令跟随任务,以实现细粒度的三分类判断(安全、争议、不安全);以及流式Qwen3Guard,它引入了令牌级分类头,用于增量文本生成过程中的实时安全监控。两种变体均提供三种规模(0.6B、4B和8B参数),并支持多达119种语言和方言,为全球LLM部署提供了全面、可扩展且低延迟的安全审核。在英语、中文及多语言基准测试中,Qwen3Guard在提示与响应安全分类上均达到了业界领先水平。所有模型均以Apache 2.0许可证发布,供公众使用。
English
As large language models (LLMs) become more capable and widely used, ensuring
the safety of their outputs is increasingly critical. Existing guardrail
models, though useful in static evaluation settings, face two major limitations
in real-world applications: (1) they typically output only binary "safe/unsafe"
labels, which can be interpreted inconsistently across diverse safety policies,
rendering them incapable of accommodating varying safety tolerances across
domains; and (2) they require complete model outputs before performing safety
checks, making them fundamentally incompatible with streaming LLM inference,
thereby preventing timely intervention during generation and increasing
exposure to harmful partial outputs. To address these challenges, we present
Qwen3Guard, a series of multilingual safety guardrail models with two
specialized variants: Generative Qwen3Guard, which casts safety classification
as an instruction-following task to enable fine-grained tri-class judgments
(safe, controversial, unsafe); and Stream Qwen3Guard, which introduces a
token-level classification head for real-time safety monitoring during
incremental text generation. Both variants are available in three sizes (0.6B,
4B, and 8B parameters) and support up to 119 languages and dialects, providing
comprehensive, scalable, and low-latency safety moderation for global LLM
deployments. Evaluated across English, Chinese, and multilingual benchmarks,
Qwen3Guard achieves state-of-the-art performance in both prompt and response
safety classification. All models are released under the Apache 2.0 license for
public use.