経験が熟練を生む：自己進化型スキル記憶による汎化可能な医療エージェント推論の実現

要旨

医療エージェントシステムは、静的な質問応答だけでなく、対話的な臨床意思決定を支援することが期待されるようになっています。このような状況では、効果的なエージェントは進化する症例にわたって過去の経験を再利用する必要がありますが、既存のメモリ機構はしばしば冗長でノイズが多く、管理が難しい生の履歴を保持しています。さらに重要なことに、それらは将来の推論に本当に有用な記憶をほとんど区別しません。これにより、長期的な臨床推論のためにコンパクトで信頼性の高い経験を蓄積する能力が制限されます。このギャップを埋めるために、我々はSkeMexを提案します。これは、モデルの重みを更新することなく、スキルベースのメモリを通じて医療エージェントを改善する、デプロイ後自己進化フレームワークです。SkeMexは、有益な対話軌跡を、再利用可能な手続き的知識を符号化した構造化スキルに蒸留し、それらを汎用、タスク固有、行動レベルの経験にわたるマルチブランチリポジトリに整理します。どの記憶を再利用し保持すべきかを判断するために、SkeMexは環境からのフィードバックから文脈依存の有用性を推定し、それを価値認識型検索とリポジトリ管理のガイドに使用します。「読み取り→書き込み→評価→管理」の閉ループライフサイクルは、新しいスキルの書き込み、有用性の更新、有用な記憶の促進、有害なエントリの削除を通じて、継続的な進化をさらに支援します。多様な臨床タスクにわたる実験により、SkeMexがオフラインおよびオンラインの両方の設定で、代表的なメモリベースのエージェントを一貫して上回ることが示されました。また、様々なモデルバックボーンにわたって汎化し、転送可能なスキルメモリをサポートします。すべてのデータとコードは公開される予定です。

English

Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are redundant, noisy, and difficult to govern. More importantly, they rarely distinguish which memories are truly useful for future reasoning. This limits their ability to accumulate compact and reliable experience for long-horizon clinical reasoning. To close this gap, we propose SkeMex, a post-deployment self-evolution framework that improves medical agents through a skill-based memory without updating model weights. SkeMex distills informative interaction trajectories into structured skills that encode reusable procedural knowledge, and organizes them into a multi-branch repository spanning general, task-specific, and action-level experience. To determine which memories should be reused and retained, SkeMex estimates context-dependent utility from environment feedback and uses it to guide value-aware retrieval and repository governance. A closed-loop ``Read--Write--Assess--Govern" lifecycle further supports continual evolution by writing new skills, updating utilities, promoting useful memories, and removing harmful entries. Experiments across diverse clinical tasks show that SkeMex consistently outperforms representative memory-based agents in both offline and online settings. It also generalizes across model backbones and supports transferable skill memory. All data and code will be released publicly.