Meissa: マルチモーダル医療エージェント知能

要旨

マルチモーダル大規模言語モデル（MM-LLM）は、医療画像理解と臨床推論において優れた性能を示している。近年の医療エージェントシステムは、ツール利用とマルチエージェント協調機能を追加することで、複雑な意思決定を可能にしている。しかし、これらのシステムはほぼ完全にフロンティアモデル（GPTなど）に依存しており、APIベースの展開ではコストと遅延が高く、オンプレミス臨床要件と衝突するプライバシーリスクが生じる。本研究では、軽量な40億パラメータの医療MM-LLM「Meissa」を提案する。本モデルは静的な回答模倣ではなく、フロンティアモデルから構造化された軌跡を蒸留することで、外部連携の開始タイミング（戦略選択）と多段階連携の実行方法（戦略実行）の両方を学習し、エージェント機能をオフラインで実現する。具体的には以下を導入する：（1）統一軌跡モデリング：推論と行動の軌跡を単一の状態-行動-観測形式で表現し、異種医療環境間での汎化を可能にする。（2）三段階階層的監督：モデル自身の誤差が契機となり、直接推論→ツール拡張→マルチエージェント連携へ段階的にエスカレーションする難易度認識型戦略選択を明示的に学習する。（3）展望-回顧的監督：探索的前進軌跡と後知恵的に合理化された実行軌跡を組み合わせることで、効果的な連携ポリシーの安定学習を実現する。4万件の精選軌跡で学習したMeissaは、放射線科・病理科・臨床推論を含む13の医療ベンチマークにおける16評価設定のうち10設定で、専用フロンティアエージェントを匹敵または上回る性能を示した。Gemini-3のような典型的フロンティアモデルと比べてパラメータ数が25分の1以下であり、完全オフライン動作でAPIベース展開と比較してエンドツーエンド遅延を22分の1に低減する。データ・モデル・環境はhttps://github.com/Schuture/Meissaで公開する。

English

Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely almost entirely on frontier models (e.g., GPT), whose API-based deployment incurs high cost, high latency, and privacy risks that conflict with on-premise clinical requirements. We present Meissa, a lightweight 4B-parameter medical MM-LLM that brings agentic capability offline. Instead of imitating static answers, Meissa learns both when to engage external interaction (strategy selection) and how to execute multi-step interaction (strategy execution) by distilling structured trajectories from frontier models. Specifically, we propose: (1) Unified trajectory modeling: trajectories (reasoning and action traces) are represented within a single state-action-observation formalism, allowing one model to generalize across heterogeneous medical environments. (2) Three-tier stratified supervision: the model's own errors trigger progressive escalation from direct reasoning to tool-augmented and multi-agent interaction, explicitly learning difficulty-aware strategy selection. (3) Prospective-retrospective supervision: pairing exploratory forward traces with hindsight-rationalized execution traces enables stable learning of effective interaction policies. Trained on 40K curated trajectories, Meissa matches or exceeds proprietary frontier agents in 10 of 16 evaluation settings across 13 medical benchmarks spanning radiology, pathology, and clinical reasoning. Using over 25x fewer parameters than typical frontier models like Gemini-3, Meissa operates fully offline with 22x lower end-to-end latency compared to API-based deployment. Data, models, and environments are released at https://github.com/Schuture/Meissa.

Meissa: マルチモーダル医療エージェント知能

Meissa: Multi-modal Medical Agentic Intelligence

要旨

Support