走向对话式诊断人工智能

摘要

在医学的核心是医生与患者之间的对话，熟练的病史采集为准确诊断、有效管理和持久信任铺平了道路。能够进行诊断对话的人工智能（AI）系统可以提高医疗护理的可及性、一致性和质量。然而，逼近临床医生的专业知识仍然是一个重大挑战。在这里，我们介绍了AMIE（Articulate Medical Intelligence Explorer），这是一个基于大型语言模型（LLM）的人工智能系统，专为诊断对话进行了优化。 AMIE使用了一种新颖的基于自我对弈的模拟环境，配备了自动化反馈机制，以便在不同疾病状况、专业领域和背景下进行学习。我们设计了一个评估临床意义维度性能的框架，包括病史采集、诊断准确性、管理推理、沟通技巧和同理心。我们通过一项随机、双盲、交叉研究，使用经过验证的患者演员进行基于文本的咨询，模拟客观结构化临床考试（OSCE），将AMIE的表现与基层医生（PCPs）进行了比较。研究包括来自加拿大、英国和印度的临床提供者提供的149个病例场景，20名基层医生与AMIE进行比较，以及由专科医生和患者演员进行的评估。根据专科医生的评估，AMIE在32个维度中的28个表现更准确，在26个维度中的24个表现优于患者演员的评估。我们的研究存在一些局限性，应该以适当的谨慎态度进行解释。临床医生仅限于不熟悉的同步文本聊天，这种方式可以进行大规模的LLM与患者的互动，但并不代表通常的临床实践。在AMIE能够转化为现实世界设置之前，需要进一步的研究，但这些结果代表了朝着会话式诊断人工智能迈出的重要一步。

English

At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.

走向对话式诊断人工智能

Towards Conversational Diagnostic AI

摘要

Support