에이전트 기반 RAG를 이용한 구성 가능한 임상 정보 추출: 작동하는 것, 깨지는 것, 그리고 그 이유

초록

환자 맥락은 수백 개의 이질적인 문서와 수천 개의 구조화된 데이터 포인트에 걸쳐 있지만, AI 시스템이 검색 및 분류에 필요로 하는 문서 수준의 메타데이터는 존재하지 않거나 불완전하다. 표준 검색 증강 생성은 이러한 데이터에서 실패하며, 시간적 추론, 문서 간 의존성, 누락된 메타데이터를 적절히 처리하지 못한다. 우리는 에센 대학 병원(University Medicine Essen)에서 ACIE(에이전트 임상 정보 추출)를 배포한다: 이는 온프레미스 에이전트 기반 RAG 파이프라인으로, 전체 환자 맥락을 추론하고 모든 답변을 임상의 검증을 위해 출처 구절에 근거한다. 우리는 메타데이터 격차를 정량화하고, 이로 인해 형성된 아키텍처 결정을 추적하며, 추출을 평가하는 동시에 독립적인 후향적 림프종 등록 연구를 수행하였으며, 이 연구에서 핵의학 의사들은 추출된 모든 값을 인용된 출처에 대비하여 검증하였다. 7,326건의 판단에서 임상의는 추출 결과의 96.5%를 수용하였으며, 유형별 수용률은 80%에서 99%에 이르렀다.

English

Patient contexts span hundreds of heterogeneous documents and thousands of structured data points, yet the document-level metadata that AI systems need for retrieval and triage is absent or incomplete. Standard retrieval-augmented generation fails on this data, mishandling temporal reasoning, cross-document dependencies, and missing metadata. We deploy ACIE (Agentic Clinical Information Extraction) at University Medicine Essen: an on-premise agentic RAG pipeline that reasons over complete patient contexts and grounds every answer in source passages for clinician verification. We quantify the metadata gap, trace the architectural decisions it shaped, and evaluate extraction alongside an independent retrospective lymphoma registry study, in which nuclear-medicine physicians verify every extracted value against its cited sources. Across 7,326 judgments, clinicians accepted 96.5\% of extractions, with per-type acceptance ranging from 80\% to 99\%.