邁向對話式診斷人工智慧

摘要

醫學的核心在於醫師與病患之間的對話，熟練的病史探詢為準確診斷、有效管理和持久的信任鋪平了道路。具有診斷對話能力的人工智慧（AI）系統可以提高醫療保健的可及性、一致性和質量。然而，接近臨床醫師專業知識的能力仍是一個傑出的重大挑戰。在這裡，我們介紹了AMIE（Articulate Medical Intelligence Explorer），這是一個基於大型語言模型（LLM）的AI系統，經過優化以進行診斷對話。 AMIE使用一個新穎的基於自我對弈的模擬環境，配備自動化反饋機制，以擴展對各種疾病症狀、專業領域和情境的學習。我們設計了一個評估臨床意義性能指標的框架，包括病史探詢、診斷準確性、管理推理、溝通技巧和同理心。我們通過一項隨機、雙盲交叉研究，使用經過驗證的患者演員進行基於文本的諮詢，類似客觀結構臨床考試（OSCE），將AMIE的表現與基層醫師（PCPs）進行比較。該研究包括來自加拿大、英國和印度的臨床提供者提供的149個病例情境，20名PCPs用於與AMIE進行比較，並由專科醫師和患者演員進行評估。根據專科醫師的評估，AMIE在32個指標中有28個表現優異，而根據患者演員的評估，在26個指標中有24個表現優異。我們的研究存在幾個限制，應該以適當的謹慎態度來解釋。臨床醫師僅限於不熟悉的同步文本聊天，這允許大規模的LLM-患者互動，但不代表通常的臨床實踐。在AMIE能夠應用於現實世界設置之前，需要進一步的研究，但這些結果代表了邁向對話式診斷AI的里程碑。

English

At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.

邁向對話式診斷人工智慧

Towards Conversational Diagnostic AI

摘要

Support