LiveSecBench:面向中文语境大语言模型的动态文化适配型AI安全基准评测体系
LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context
November 4, 2025
作者: Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang
cs.AI
摘要
本研究提出LiveSecBench——一个面向中文大语言模型应用场景的动态持续更新安全评测体系。该基准以中国法律与社会框架为根基,从合法性、伦理道德、事实准确性、隐私保护、抗对抗攻击能力及推理安全性六大关键维度对模型进行综合评价。通过动态更新机制,基准将持续纳入新型威胁向量(如下次更新计划增加的文生图安全性与智能体安全性),确保评测体系的时效性。目前LiveSecBench(v251030)已完成对18个大语言模型的评估,勾勒出中文语境下AI安全能力全景图。评测排行榜可通过https://livesecbench.intokentech.cn/ 公开访问。
English
In this work, we propose LiveSecBench, a dynamic and continuously updated
safety benchmark specifically for Chinese-language LLM application scenarios.
LiveSecBench evaluates models across six critical dimensions (Legality, Ethics,
Factuality, Privacy, Adversarial Robustness, and Reasoning Safety) rooted in
the Chinese legal and social frameworks. This benchmark maintains relevance
through a dynamic update schedule that incorporates new threat vectors, such as
the planned inclusion of Text-to-Image Generation Safety and Agentic Safety in
the next update. For now, LiveSecBench (v251030) has evaluated 18 LLMs,
providing a landscape of AI safety in the context of Chinese language. The
leaderboard is publicly accessible at https://livesecbench.intokentech.cn/.