R^textbf{2AI}：迈向适应不断变化世界的抗性与韧性人工智能

摘要

在本立场文件中，我们探讨了人工智能能力飞速增长与安全进展滞后之间持续存在的鸿沟。现有范式分为“使AI安全”，即事后应用对齐和防护措施，但这种方法脆弱且被动；以及“制造安全AI”，强调内在安全性，却难以应对开放环境中不可预见的风险。因此，我们提出“协同进化安全”作为“制造安全AI”范式的新构想，灵感源自生物免疫系统，将安全视为一个动态、对抗且持续的学习过程。为实现这一愿景，我们引入R^2AI——抗性与韧性AI——作为一个实用框架，它结合了对已知威胁的抵抗力和对未知风险的适应力。R^2AI整合了快速与慢速安全模型，通过安全风洞进行对抗模拟与验证，以及引导安全与能力协同进化的持续反馈循环。我们认为，该框架为在动态环境中维持持续安全提供了一条可扩展且主动的路径，既解决了近期的脆弱性问题，也应对了AI向通用人工智能（AGI）和超级智能（ASI）迈进过程中的长期生存风险。

English

In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R^2AI -- Resistant and Resilient AI -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R^2AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.

R^textbf{2AI}：迈向适应不断变化世界的抗性与韧性人工智能

R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

摘要

Support