ChatPaper.aiChatPaper

SafeScientist:面向LLM代理的风险感知科學發現

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

May 29, 2025
作者: Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You
cs.AI

摘要

大型语言模型(LLM)代理的最新进展显著加速了科学发现的自动化进程,但同时也引发了关键的伦理和安全问题。为系统性地应对这些挑战,我们引入了SafeScientist,这是一个创新的AI科学家框架,专门设计用于增强AI驱动科学探索中的安全性和伦理责任。SafeScientist主动拒绝伦理上不适当或高风险的任务,并在整个研究过程中严格强调安全性。为实现全面的安全监督,我们整合了多种防御机制,包括提示监控、代理协作监控、工具使用监控以及伦理审查组件。作为SafeScientist的补充,我们提出了SciSafetyBench,这是一个专门设计用于评估科学背景下AI安全性的新基准,包含跨6个领域的240个高风险科学任务,以及30个特别设计的科学工具和120个与工具相关的风险任务。大量实验表明,与传统AI科学家框架相比,SafeScientist显著提高了35%的安全性能,且未影响科学输出质量。此外,我们严格验证了安全管道针对多种对抗攻击方法的鲁棒性,进一步证实了我们集成方法的有效性。代码和数据将在https://github.com/ulab-uiuc/SafeScientist 上提供。红色{警告:本文包含可能具有冒犯性或危害性的示例数据。}
English
Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce SafeScientist, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process. To achieve comprehensive safety oversight, we integrate multiple defensive mechanisms, including prompt monitoring, agent-collaboration monitoring, tool-use monitoring, and an ethical reviewer component. Complementing SafeScientist, we propose SciSafetyBench, a novel benchmark specifically designed to evaluate AI safety in scientific contexts, comprising 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. Extensive experiments demonstrate that SafeScientist significantly improves safety performance by 35\% compared to traditional AI scientist frameworks, without compromising scientific output quality. Additionally, we rigorously validate the robustness of our safety pipeline against diverse adversarial attack methods, further confirming the effectiveness of our integrated approach. The code and data will be available at https://github.com/ulab-uiuc/SafeScientist. red{Warning: this paper contains example data that may be offensive or harmful.}

Summary

AI-Generated Summary

PDF122May 30, 2025