AgentSLR: 疫学におけるシステマティックレビューをエージェントAIで自動化

要旨

系統的な文献レビューは科学的エビデンスを統合する上で不可欠であるが、コストがかかり、規模拡大が困難で時間を要するため、エビデンスに基づく政策立案のボトルネックとなっている。本研究では、大規模言語モデルが、論文検索、論文スクリーニング、データ抽出、報告書合成に至る系統的レビューのワークフロー全体を自動化できるかどうかを検討する。WHOが指定する9つの優先病原体に関する疫学レビューに適用し、専門家が作成した正解データに対して検証した結果、我々のオープンソースのエージェント型パイプライン（AgentSLR）は、研究者と同等の性能を達成しつつ、レビュー時間をおよそ7週間から20時間（58倍の高速化）に短縮した。5つの先進的モデルを比較した結果、SLRにおける性能は、モデルサイズや推論コストよりも、各モデルが持つ独自の能力に大きく依存することが明らかになった。ヒューマンインザループ検証を通じて、主要な失敗モードを特定した。我々の結果は、エージェント型AIが専門分野における科学的エビデンス統合を大幅に加速できることを実証している。

English

Systematic literature reviews are essential for synthesizing scientific evidence but are costly, difficult to scale and time-intensive, creating bottlenecks for evidence-based policy. We study whether large language models can automate the complete systematic review workflow, from article retrieval, article screening, data extraction to report synthesis. Applied to epidemiological reviews of nine WHO-designated priority pathogens and validated against expert-curated ground truth, our open-source agentic pipeline (AgentSLR) achieves performance comparable to human researchers while reducing review time from approximately 7 weeks to 20 hours (a 58x speed-up). Our comparison of five frontier models reveals that performance on SLR is driven less by model size or inference cost than by each model's distinctive capabilities. Through human-in-the-loop validation, we identify key failure modes. Our results demonstrate that agentic AI can substantially accelerate scientific evidence synthesis in specialised domains.

AgentSLR: 疫学におけるシステマティックレビューをエージェントAIで自動化

AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI

要旨

Support