SafeScientist: LLM 에이전트를 통한 위험 인식 과학적 발견을 향하여

초록

대형 언어 모델(LLM) 에이전트의 최근 발전은 과학적 발견의 자동화를 크게 가속화했지만, 동시에 중요한 윤리적 및 안전 문제를 제기했습니다. 이러한 도전 과제를 체계적으로 해결하기 위해, 우리는 AI 주도 과학 탐구에서 안전성과 윤리적 책임을 강화하기 위해 명시적으로 설계된 혁신적인 AI 과학자 프레임워크인 SafeScientist를 소개합니다. SafeScientist는 윤리적으로 부적절하거나 고위험 작업을 사전에 거부하고 연구 과정 전반에 걸쳐 안전성을 엄격히 강조합니다. 포괄적인 안전 감독을 달성하기 위해, 우리는 프롬프트 모니터링, 에이전트 협업 모니터링, 도구 사용 모니터링 및 윤리 검토자 구성 요소를 포함한 여러 방어 메커니즘을 통합했습니다. SafeScientist를 보완하기 위해, 우리는 과학적 맥락에서 AI 안전성을 평가하기 위해 특별히 설계된 새로운 벤치마크인 SciSafetyBench를 제안합니다. 이 벤치마크는 6개 분야에 걸친 240개의 고위험 과학 작업과 30개의 특별히 설계된 과학 도구 및 120개의 도구 관련 위험 작업으로 구성됩니다. 광범위한 실험을 통해 SafeScientist가 전통적인 AI 과학자 프레임워크에 비해 안전 성능을 35% 향상시키면서도 과학적 결과의 질을 저하시키지 않음을 입증했습니다. 또한, 우리는 다양한 적대적 공격 방법에 대해 안전 파이프라인의 견고성을 엄격히 검증하여 통합 접근 방식의 효과를 추가로 확인했습니다. 코드와 데이터는 https://github.com/ulab-uiuc/SafeScientist에서 제공될 예정입니다. 경고: 이 논문은 공격적이거나 유해할 수 있는 예시 데이터를 포함하고 있습니다.

English

Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce SafeScientist, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process. To achieve comprehensive safety oversight, we integrate multiple defensive mechanisms, including prompt monitoring, agent-collaboration monitoring, tool-use monitoring, and an ethical reviewer component. Complementing SafeScientist, we propose SciSafetyBench, a novel benchmark specifically designed to evaluate AI safety in scientific contexts, comprising 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. Extensive experiments demonstrate that SafeScientist significantly improves safety performance by 35\% compared to traditional AI scientist frameworks, without compromising scientific output quality. Additionally, we rigorously validate the robustness of our safety pipeline against diverse adversarial attack methods, further confirming the effectiveness of our integrated approach. The code and data will be available at https://github.com/ulab-uiuc/SafeScientist. red{Warning: this paper contains example data that may be offensive or harmful.}

SafeScientist: LLM 에이전트를 통한 위험 인식 과학적 발견을 향하여

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

초록

Support