CASCADE: 部署期间大语言模型的基于案例的持续适应

摘要

大型语言模型（LLMs）已成为现代人工智能的核心基础，但其生命周期仍受限于训练与部署的严格分离——部署后学习行为实际上便停止了。这一局限性与其持续通过环境交互进行适应的自然智能形成鲜明对比。本文形式化定义了"部署时学习"（DTL）作为LLM生命周期的第三阶段，使LLM智能体在部署过程中无需修改模型参数即可通过经验持续改进。我们提出CASCADE（基于案例的持续部署适应）框架——一种通用且规范的方法，为LLM智能体配备显式演进的场景记忆。CASCADE将经验复用建模为上下文赌博机问题，实现原则性的探索-利用权衡，并建立长期交互中的无遗憾保证。该设计使智能体能够积累、选择并优化任务相关案例，将过往经验转化为可执行知识。在涵盖医疗诊断、法律分析、代码生成、网络搜索、工具使用及具身交互的16项多样化任务中，CASCADE相较零样本提示将宏平均成功率提升20.9%，且持续优于基于梯度与基于记忆的基线方法。通过将部署重新定义为适应性学习过程，本研究为持续改进的人工智能系统奠定基础。

English

Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-based Continual Adaptation during DEployment), a general and principled framework that equips LLM agents with an explicit, evolving episodic memory. CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions. This design allows agents to accumulate, select, and refine task-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improves macro-averaged success rate by 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.

CASCADE: 部署期间大语言模型的基于案例的持续适应

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

摘要

Support