經驗成就熟練：通過自我進化技能記憶實現可泛化的醫療智能體推理

摘要

医学代理系统日益被期望能够支持交互式临床决策，而不仅仅是静态的问答。在此类场景中，有效的代理必须能够在不断演变的病例中复用先前的经验，然而现有的记忆机制往往保留原始的历程记录，这些记录冗余、嘈杂且难以管理。更重要的是，它们很少能区分哪些记忆对未来推理真正有用。这限制了它们为长期临床推理积累紧凑而可靠经验的能力。为弥补这一差距，我们提出了SkeMex，一个部署后的自我进化框架，通过基于技能的记忆来改进医学代理，而无需更新模型权重。SkeMex将信息丰富的交互轨迹提炼为结构化技能，编码可复用的程序性知识，并将其组织成一个多分支的知识库，涵盖通用经验、任务特定经验和动作级经验。为了决定哪些记忆应被复用和保留，SkeMex从环境反馈中估计上下文相关的效用，并利用它来指导价值感知的检索和知识库治理。一个闭环的“读取-写入-评估-治理”生命周期进一步支持持续进化，通过写入新技能、更新效用、促进有用记忆和删除有害条目来实现。在多种临床任务上的实验表明，SkeMex在离线与在线设置中均持续优于具有代表性的基于记忆的代理。它还能跨模型主干进行泛化，并支持可迁移的技能记忆。所有数据和代码将公开发布。

English

Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are redundant, noisy, and difficult to govern. More importantly, they rarely distinguish which memories are truly useful for future reasoning. This limits their ability to accumulate compact and reliable experience for long-horizon clinical reasoning. To close this gap, we propose SkeMex, a post-deployment self-evolution framework that improves medical agents through a skill-based memory without updating model weights. SkeMex distills informative interaction trajectories into structured skills that encode reusable procedural knowledge, and organizes them into a multi-branch repository spanning general, task-specific, and action-level experience. To determine which memories should be reused and retained, SkeMex estimates context-dependent utility from environment feedback and uses it to guide value-aware retrieval and repository governance. A closed-loop ``Read--Write--Assess--Govern" lifecycle further supports continual evolution by writing new skills, updating utilities, promoting useful memories, and removing harmful entries. Experiments across diverse clinical tasks show that SkeMex consistently outperforms representative memory-based agents in both offline and online settings. It also generalizes across model backbones and supports transferable skill memory. All data and code will be released publicly.