会話型診断AIに向けて

要旨

医療の核心には医師と患者の対話があり、熟練した病歴聴取が正確な診断、効果的な治療、そして持続的な信頼関係への道を開きます。診断対話が可能な人工知能（AI）システムは、医療へのアクセス性、一貫性、そして質の向上をもたらす可能性があります。しかし、臨床医の専門性を近似することは未解決の大きな課題です。本論文では、診断対話に最適化された大規模言語モデル（LLM）ベースのAIシステム、AMIE（Articulate Medical Intelligence Explorer）を紹介します。 AMIEは、多様な疾患状態、専門分野、状況にわたる学習を拡張するための自動フィードバックメカニズムを備えた、新規の自己対戦型シミュレーション環境を使用しています。我々は、病歴聴取、診断精度、治療計画の推論、コミュニケーションスキル、共感力など、臨床的に意味のあるパフォーマンス軸を評価するためのフレームワークを設計しました。AMIEのパフォーマンスを、カナダ、英国、インドの臨床提供者から得た149の症例シナリオ、20人のプライマリケア医（PCPs）、そして専門医と患者役による評価を用いて、ランダム化二重盲検クロスオーバー研究で比較しました。この研究は、Objective Structured Clinical Examination（OSCE）のスタイルで、検証済みの患者役とのテキストベースの相談を基に行われました。専門医による評価では、AMIEは32の評価軸のうち28で、患者役による評価では26のうち24で優れたパフォーマンスを示しました。本研究にはいくつかの限界があり、適切な注意を払って解釈する必要があります。臨床医は、大規模なLLMと患者の相互作用を可能にするが、通常の臨床実践を代表しない、不慣れな同期テキストチャットに限定されていました。AMIEが実世界の設定に適用されるためにはさらなる研究が必要ですが、この結果は会話型診断AIに向けたマイルストーンを示しています。

English

At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.

会話型診断AIに向けて

Towards Conversational Diagnostic AI

要旨

Support