SafeScientist: LLMエージェントによるリスク認識型科学的発見に向けて

要旨

大規模言語モデル（LLM）エージェントの最近の進展は、科学発見の自動化を大幅に加速させた一方で、重要な倫理的および安全性に関する懸念も同時に引き起こしています。これらの課題を体系的に対処するため、我々はSafeScientistを紹介します。これは、AI駆動の科学探査における安全性と倫理的責任を強化するために明示的に設計された革新的なAI科学者フレームワークです。SafeScientistは、倫理的に不適切または高リスクなタスクを積極的に拒否し、研究プロセス全体で安全性を厳密に重視します。包括的な安全監視を実現するために、プロンプト監視、エージェント協調監視、ツール使用監視、および倫理審査コンポーネントを含む複数の防御メカニズムを統合しています。SafeScientistを補完するために、我々はSciSafetyBenchを提案します。これは、科学的文脈におけるAIの安全性を評価するために特別に設計された新しいベンチマークで、6つの分野にわたる240の高リスク科学タスク、30の特別に設計された科学ツール、および120のツール関連リスクタスクで構成されています。広範な実験により、SafeScientistが従来のAI科学者フレームワークと比較して安全性パフォーマンスを35％大幅に向上させることが示されました。さらに、我々は多様な敵対的攻撃手法に対する安全パイプラインの堅牢性を厳密に検証し、統合アプローチの有効性をさらに確認しました。コードとデータはhttps://github.com/ulab-uiuc/SafeScientistで公開されます。赤色{警告：この論文には攻撃的または有害な可能性のある例データが含まれています。}

English

Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce SafeScientist, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process. To achieve comprehensive safety oversight, we integrate multiple defensive mechanisms, including prompt monitoring, agent-collaboration monitoring, tool-use monitoring, and an ethical reviewer component. Complementing SafeScientist, we propose SciSafetyBench, a novel benchmark specifically designed to evaluate AI safety in scientific contexts, comprising 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. Extensive experiments demonstrate that SafeScientist significantly improves safety performance by 35\% compared to traditional AI scientist frameworks, without compromising scientific output quality. Additionally, we rigorously validate the robustness of our safety pipeline against diverse adversarial attack methods, further confirming the effectiveness of our integrated approach. The code and data will be available at https://github.com/ulab-uiuc/SafeScientist. red{Warning: this paper contains example data that may be offensive or harmful.}

SafeScientist: LLMエージェントによるリスク認識型科学的発見に向けて

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

要旨

Support