前沿人工智能风险管理框架实践:风险分析技术报告v1.5版
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5
February 16, 2026
作者: Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang, Shuai Shao, Boxuan Zhang, Zicheng Liu, Jingwei Sun, Yu Li, Yuejin Xie, Jiaxuan Guo, Jia Xu, Chaochao Lu, Bowen Zhou, Xia Hu, Jing Shao
cs.AI
摘要
为深入理解并识别快速演进的人工智能模型带来的前所未有的风险,《前沿人工智能风险管理实践框架》对其前沿风险进行了全面评估。随着大语言模型通用能力的飞速发展以及智能体人工智能的普及,本版风险分析技术报告从五个关键维度提出了更新且更精细的评估:网络攻击、说服操控、战略欺骗、失控的AI研发以及自我复制。具体而言,我们针对网络攻击引入了更复杂的场景;在说服操控方面,评估了新发布大语言模型间相互说服的风险;针对战略欺骗与阴谋策划,新增了关于涌现性失准的实验;在失控AI研发方面,重点关注智能体自主扩展记忆载体与工具集时出现的"错误进化"现象。此外,我们还监测并评估了OpenClaw在Moltbook平台交互过程中的安全表现。对于自我复制维度,我们引入了资源受限的新场景。更重要的是,我们提出并验证了一系列强韧的缓解策略以应对这些新兴威胁,为前沿AI的安全部署提供了初步的技术路径与行动指南。这项工作反映了我们当前对AI前沿风险的认知,并呼吁采取集体行动来应对这些挑战。
English
To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and the proliferation of agentic AI, this version of the risk analysis technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R\&D, and self-replication. Specifically, we introduce more complex scenarios for cyber offense. For persuasion and manipulation, we evaluate the risk of LLM-to-LLM persuasion on newly released LLMs. For strategic deception and scheming, we add the new experiment with respect to emergent misalignment. For uncontrolled AI R\&D, we focus on the ``mis-evolution'' of agents as they autonomously expand their memory substrates and toolsets. Besides, we also monitor and evaluate the safety performance of OpenClaw during the interaction on the Moltbook. For self-replication, we introduce a new resource-constrained scenario. More importantly, we propose and validate a series of robust mitigation strategies to address these emerging threats, providing a preliminary technical and actionable pathway for the secure deployment of frontier AI. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.