朝向人工通用智能安全與治理的最佳實踐:專家意見調查
Towards best practices in AGI safety and governance: A survey of expert opinion
May 11, 2023
作者: Jonas Schuett, Noemi Dreksler, Markus Anderljung, David McCaffary, Lennart Heim, Emma Bluemke, Ben Garfinkel
cs.AI
摘要
包括 OpenAI、Google DeepMind 和 Anthropic 在內的多家領先的人工智慧公司,均宣稱其目標是建立人工通用智能(AGI)- 即在廣泛認知任務中實現或超越人類表現的人工智能系統。為實現此目標,它們可能開發並部署具有特別重大風險的人工智能系統。儘管它們已經採取了一些措施來減輕這些風險,但最佳實踐尚未出現。為支持確定最佳實踐,我們向來自AGI實驗室、學術界和公民社會的92位領先專家發送了一份調查,並收到了51份回應。參與者被問及他們對50個有關AGI實驗室應該採取的行動的陳述有多大同意。我們的主要發現是,參與者平均而言對所有陳述都表示同意。許多陳述獲得了極高水平的一致同意。例如,有98%的受訪者在某種程度上或強烈同意,AGI實驗室應進行部署前風險評估、危險能力評估、第三方模型審核、模型使用安全限制以及紅隊測試。最終,我們的陳述清單可能成為制定AGI實驗室最佳實踐、標準和規定的有益基礎。
English
A number of leading AI companies, including OpenAI, Google DeepMind, and
Anthropic, have the stated goal of building artificial general intelligence
(AGI) - AI systems that achieve or exceed human performance across a wide range
of cognitive tasks. In pursuing this goal, they may develop and deploy AI
systems that pose particularly significant risks. While they have already taken
some measures to mitigate these risks, best practices have not yet emerged. To
support the identification of best practices, we sent a survey to 92 leading
experts from AGI labs, academia, and civil society and received 51 responses.
Participants were asked how much they agreed with 50 statements about what AGI
labs should do. Our main finding is that participants, on average, agreed with
all of them. Many statements received extremely high levels of agreement. For
example, 98% of respondents somewhat or strongly agreed that AGI labs should
conduct pre-deployment risk assessments, dangerous capabilities evaluations,
third-party model audits, safety restrictions on model usage, and red teaming.
Ultimately, our list of statements may serve as a helpful foundation for
efforts to develop best practices, standards, and regulations for AGI labs.