R^textbf{2AI}: 進化する世界における耐性と回復力を備えたAIに向けて

要旨

本ポジションペーパーでは、急速に進化するAIの能力と遅れがちな安全性の進展との間に存在する持続的なギャップに取り組む。既存のパラダイムは、「AIを安全にする」アプローチと「安全なAIを作る」アプローチに分かれる。前者は事後的アライメントやガードレールを適用するが、脆弱で反応的である。後者は本質的な安全性を重視するが、開放的な環境での予期せぬリスクに対処するのに苦労する。そこで我々は、生物学的免疫にインスパイアされた「安全なAIを作る」パラダイムの新たな定式化として、安全性を動的で敵対的かつ継続的な学習プロセスとする「共進化による安全（safe-by-coevolution）」を提案する。このビジョンを実践するため、既知の脅威に対する耐性と予期せぬリスクに対する回復力を統合した実用的なフレームワークとして、R^2AI（Resistant and Resilient AI）を導入する。R^2AIは、高速および低速の安全モデル、安全性風洞による敵対的シミュレーションと検証、安全性と能力の共進化を導く継続的フィードバックループを統合する。このフレームワークは、動的な環境での継続的な安全性を維持するためのスケーラブルで先見的な道筋を提供し、AIがAGIやASIに向けて進化する中で、近未来の脆弱性と長期的な存続的リスクの両方に対処するものであると主張する。

English

In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R^2AI -- Resistant and Resilient AI -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R^2AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.

R^textbf{2AI}: 進化する世界における耐性と回復力を備えたAIに向けて

R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

要旨

Support