前沿人工智慧風險管理實務框架:風險分析技術報告 v1.5
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5
February 16, 2026
作者: Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang, Shuai Shao, Boxuan Zhang, Zicheng Liu, Jingwei Sun, Yu Li, Yuejin Xie, Jiaxuan Guo, Jia Xu, Chaochao Lu, Bowen Zhou, Xia Hu, Jing Shao
cs.AI
摘要
為理解並識別快速發展的人工智慧模型所帶來的空前風險,《前沿人工智慧風險管理實務框架》對其前沿風險進行了全面評估。隨著大型語言模型的通用能力快速演進,以及能動型人工智慧的普及,本版風險分析技術報告針對五個關鍵維度提出更新且更細緻的評估:網路攻擊、說服與操控、戰略性欺騙、失控的AI研發,以及自我複製能力。具體而言,我們為網路攻擊設計了更複雜的情境;針對說服與操控,評估了新發布大型語言模型在「模型間說服」方面的風險;在戰略性欺騙與謀劃方面,新增了關於湧現性失準的實驗;針對失控的AI研發,聚焦於智能體自主擴展記憶基質與工具組時可能產生的「錯誤演化」。此外,我們亦監測並評估了OpenClaw在Moltbook平台互動過程中的安全表現。關於自我複製能力,則引入了新的資源受限情境。更重要的是,我們提出並驗證了一系列強健的緩解策略以應對這些新興威脅,為前沿人工智慧的安全部署提供了初步技術路徑與可行方案。本研究成果反映了我們當前對AI前沿風險的理解,並呼籲採取集體行動以應對這些挑戰。
English
To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and the proliferation of agentic AI, this version of the risk analysis technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R\&D, and self-replication. Specifically, we introduce more complex scenarios for cyber offense. For persuasion and manipulation, we evaluate the risk of LLM-to-LLM persuasion on newly released LLMs. For strategic deception and scheming, we add the new experiment with respect to emergent misalignment. For uncontrolled AI R\&D, we focus on the ``mis-evolution'' of agents as they autonomously expand their memory substrates and toolsets. Besides, we also monitor and evaluate the safety performance of OpenClaw during the interaction on the Moltbook. For self-replication, we introduce a new resource-constrained scenario. More importantly, we propose and validate a series of robust mitigation strategies to address these emerging threats, providing a preliminary technical and actionable pathway for the secure deployment of frontier AI. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.