ChatPaper.aiChatPaper

超級智能體帶來災難性風險:科學家AI能否提供更安全的路徑?

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

February 21, 2025
作者: Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, Marc-Antoine Rondeau, Pierre-Luc St-Charles, David Williams-King
cs.AI

摘要

領先的人工智慧公司正日益專注於打造通用型AI代理——這些系統能夠自主規劃、行動並追求目標,幾乎涵蓋人類能執行的所有任務。儘管這些系統可能極具實用性,但不受約束的AI代理能力對公共安全與安全構成了重大風險,從惡意行為者的濫用到可能導致人類控制權的不可逆轉喪失。我們探討了這些風險如何源自當前的AI訓練方法。事實上,多種情境與實驗已證明,AI代理有可能進行欺騙或追求未被人類操作者指定且與人類利益相衝突的目標,如自我保存。遵循預防原則,我們認為亟需開發更安全但仍具實用性的替代方案,以取代當前以代理為導向的發展軌跡。因此,我們提出作為進一步發展的核心構建塊,開發一種從設計上就值得信賴且安全的非代理型AI系統,我們稱之為「科學家AI」。該系統旨在通過觀察來解釋世界,而非在其中採取行動以模仿或取悅人類。它包含一個生成理論以解釋數據的世界模型,以及一個問答推理機。這兩個組件均以明確的不確定性概念運作,以緩解過於自信預測的風險。基於這些考量,科學家AI可用於協助人類研究人員加速科學進步,包括在AI安全領域。特別是,我們的系統可作為防護欄,對抗那些儘管存在風險仍可能被創造的AI代理。最終,聚焦於非代理型AI或許能在享受AI創新益處的同時,規避當前發展軌跡所伴隨的風險。我們希望這些論點能激勵研究人員、開發者及政策制定者選擇這條更為安全的道路。
English
The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. We discuss how these risks arise from current AI training methods. Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, we see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, we propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which we call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions. In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, our system can be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. We hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.

Summary

AI-Generated Summary

PDF52February 24, 2025