ChatPaper.aiChatPaper

R^textbf{2AI}:邁向適應不斷變化世界的抗性與韌性人工智慧

R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

September 8, 2025
作者: Youbang Sun, Xiang Wang, Jie Fu, Chaochao Lu, Bowen Zhou
cs.AI

摘要

在本立場文件中,我們探討了快速增長的AI能力與滯後的安全進展之間持續存在的差距。現有範式分為「使AI安全」,即應用事後對齊和防護措施,但仍顯脆弱和被動;以及「製造安全AI」,強調內在安全性,卻難以應對開放環境中的未知風險。因此,我們提出「安全共演化」作為「製造安全AI」範式的新構想,靈感來自生物免疫系統,其中安全性成為一個動態、對抗且持續的學習過程。為實現這一願景,我們引入了R^2AI——抗性與韌性AI——作為一個實用框架,它結合了對已知威脅的抵抗力和對未知風險的韌性。R^2AI整合了快速與慢速安全模型、通過安全風洞進行的對抗模擬與驗證,以及引導安全性和能力共同演化的持續反饋循環。我們認為,這一框架提供了一條可擴展且主動的路徑,以在動態環境中維持持續的安全性,應對AI向AGI和ASI邁進過程中既有的短期脆弱性和長期的存在性風險。
English
In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R^2AI -- Resistant and Resilient AI -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R^2AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.
PDF32September 9, 2025