AgentSLR: 에이전트 AI를 활용한 역학 분야 체계적 문헌고찰 자동화

초록

체계적 문헌고찰은 과학적 증거를 종합하는 데 필수적이지만 비용이 많이 들고 확장이 어려우며 시간이 많이 소요되어 증거 기반 정책 수립에 병목 현상을 일으킵니다. 본 연구는 대규모 언어 모델이 문헌 검색, 문헌 선별, 데이터 추출부터 보고서 종합에 이르는 체계적 문헌고찰 전 과정을 자동화할 수 있는지 검토합니다. WHO가 지정한 9가지 우선 순위 병원체에 대한 역학 리뷰에 적용하고 전문가가 구축한 기준 진실과 비교 검증한 결과, 우리의 오픈소스 에이전트 파이프라인(AgentSLR)은 인간 연구자에 버금가는 성능을 달성하면서 리뷰 소요 시간을 약 7주에서 20시간으로 단축했습니다(58배 속도 향상). 5가지 최첨단 모델을 비교한 결과, 체계적 문헌고찰 성능은 모델 크기나 추론 비용보다 각 모델의 고유한 역량에 더 크게 영향을 받는 것으로 나타났습니다. 인간 참여형 검증을 통해 주요 실패 모드를 규명했습니다. 본 연구 결과는 에이전트 기반 인공지능이 특수 분야의 과학적 증거 종합 과정을 크게 가속화할 수 있음을 보여줍니다.

English

Systematic literature reviews are essential for synthesizing scientific evidence but are costly, difficult to scale and time-intensive, creating bottlenecks for evidence-based policy. We study whether large language models can automate the complete systematic review workflow, from article retrieval, article screening, data extraction to report synthesis. Applied to epidemiological reviews of nine WHO-designated priority pathogens and validated against expert-curated ground truth, our open-source agentic pipeline (AgentSLR) achieves performance comparable to human researchers while reducing review time from approximately 7 weeks to 20 hours (a 58x speed-up). Our comparison of five frontier models reveals that performance on SLR is driven less by model size or inference cost than by each model's distinctive capabilities. Through human-in-the-loop validation, we identify key failure modes. Our results demonstrate that agentic AI can substantially accelerate scientific evidence synthesis in specialised domains.

AgentSLR: 에이전트 AI를 활용한 역학 분야 체계적 문헌고찰 자동화

AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI

초록

Support