SafeScientist：迈向LLM代理的风险感知型科学发现

摘要

近期，大型语言模型（LLM）代理的显著进展极大地推动了科学发现的自动化进程，但同时也引发了关键的伦理与安全问题。为系统性地应对这些挑战，我们推出了SafeScientist，一个创新的人工智能科学家框架，旨在强化AI驱动科学探索中的安全性与伦理责任。SafeScientist主动拒绝伦理不当或高风险的任务，并在整个研究过程中严格强调安全性。为实现全面的安全监管，我们整合了多重防御机制，包括提示监控、代理协作监控、工具使用监控及伦理审查组件。作为SafeScientist的补充，我们提出了SciSafetyBench，这是一个专门设计用于评估科学场景下AI安全性的新基准，涵盖了6个领域的240项高风险科学任务，以及30种特别设计的科学工具和120项与工具相关的风险任务。大量实验表明，与传统AI科学家框架相比，SafeScientist在保证科研成果质量的同时，显著提升了35%的安全性能。此外，我们严格验证了安全管道针对多种对抗攻击方法的鲁棒性，进一步证实了集成方法的有效性。代码与数据将在https://github.com/ulab-uiuc/SafeScientist 公开。红色警告：本文包含可能具有冒犯性或伤害性的示例数据。

English

Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce SafeScientist, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process. To achieve comprehensive safety oversight, we integrate multiple defensive mechanisms, including prompt monitoring, agent-collaboration monitoring, tool-use monitoring, and an ethical reviewer component. Complementing SafeScientist, we propose SciSafetyBench, a novel benchmark specifically designed to evaluate AI safety in scientific contexts, comprising 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. Extensive experiments demonstrate that SafeScientist significantly improves safety performance by 35\% compared to traditional AI scientist frameworks, without compromising scientific output quality. Additionally, we rigorously validate the robustness of our safety pipeline against diverse adversarial attack methods, further confirming the effectiveness of our integrated approach. The code and data will be available at https://github.com/ulab-uiuc/SafeScientist. red{Warning: this paper contains example data that may be offensive or harmful.}

SafeScientist：迈向LLM代理的风险感知型科学发现

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

摘要

Support