ChatPaper.aiChatPaper

LiveSecBench:面向中文语境大语言模型的动态文化适配型AI安全基准评测体系

LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context

November 4, 2025
作者: Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang
cs.AI

摘要

本研究提出LiveSecBench——一个面向中文大模型应用场景的动态持续更新安全基准。该基准立足中国法律与社会框架,从合法性、伦理合规性、事实准确性、隐私保护、抗对抗攻击能力及推理安全性六大核心维度对模型进行评估。通过动态更新机制,本基准将持续纳入新型威胁向量(如下一版本计划增加的文图生成安全性与智能体安全性),确保评估体系的时效性。目前LiveSecBench(v251030)已完成对18个大模型的评估,勾勒出中文语境下AI安全能力全景图。评估排行榜已公开于https://livesecbench.intokentech.cn/。
English
In this work, we propose LiveSecBench, a dynamic and continuously updated safety benchmark specifically for Chinese-language LLM application scenarios. LiveSecBench evaluates models across six critical dimensions (Legality, Ethics, Factuality, Privacy, Adversarial Robustness, and Reasoning Safety) rooted in the Chinese legal and social frameworks. This benchmark maintains relevance through a dynamic update schedule that incorporates new threat vectors, such as the planned inclusion of Text-to-Image Generation Safety and Agentic Safety in the next update. For now, LiveSecBench (v251030) has evaluated 18 LLMs, providing a landscape of AI safety in the context of Chinese language. The leaderboard is publicly accessible at https://livesecbench.intokentech.cn/.
PDF31December 2, 2025