대화형 진단 AI를 향하여

초록

의학의 핵심에는 의사와 환자 간의 대화가 자리 잡고 있으며, 숙련된 병력 청취는 정확한 진단, 효과적인 치료, 그리고 지속적인 신뢰를 위한 길을 열어줍니다. 진단 대화가 가능한 인공지능(AI) 시스템은 의료 접근성, 일관성, 그리고 치료의 질을 향상시킬 수 있습니다. 그러나 임상의의 전문성을 모방하는 것은 여전히 해결해야 할 중대한 과제입니다. 본 연구에서는 진단 대화에 최적화된 대형 언어 모델(LLM) 기반 AI 시스템인 AMIE(Articulate Medical Intelligence Explorer)를 소개합니다. AMIE는 다양한 질환, 전문 분야, 그리고 상황에서 학습을 확장하기 위해 자동화된 피드백 메커니즘을 갖춘 새로운 자기 주도적 시뮬레이션 환경을 사용합니다. 우리는 병력 청취, 진단 정확도, 치료 추론, 의사소통 기술, 그리고 공감 능력을 포함한 임상적으로 의미 있는 성능 축을 평가하기 위한 프레임워크를 설계했습니다. AMIE의 성능을 객관적 구조화된 임상 시험(OSCE) 스타일의 검증된 환자 배우와의 텍스트 기반 상담에서 1차 진료 의사(PCP)와 비교하기 위해 무작위 이중 맹검 교차 연구를 진행했습니다. 이 연구에는 캐나다, 영국, 그리고 인도의 임상 제공자로부터 수집된 149개의 사례 시나리오, AMIE와 비교를 위한 20명의 PCP, 그리고 전문 의사와 환자 배우의 평가가 포함되었습니다. AMIE는 전문 의사 평가에서 32개 축 중 28개에서, 환자 배우 평가에서 26개 축 중 24개에서 더 높은 진단 정확도와 우수한 성능을 보였습니다. 우리의 연구에는 몇 가지 한계가 있으며, 적절한 주의를 기울여 해석해야 합니다. 임상의들은 익숙하지 않은 동기화된 텍스트 채팅으로 제한되었으며, 이는 대규모 LLM-환자 상호작용을 가능하게 하지만 일반적인 임상 실습을 대표하지는 않습니다. AMIE가 실제 환경에 적용되기 위해서는 추가 연구가 필요하지만, 이 결과는 대화형 진단 AI를 향한 중요한 이정표를 나타냅니다.

English

At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.

대화형 진단 AI를 향하여

Towards Conversational Diagnostic AI

초록

Support