CASCADE：大型語言模型在部署期間的基於案例持續適應

摘要

大型語言模型（LLMs）已成為現代人工智慧的核心基礎，然而其生命週期仍受到訓練與部署之間嚴格分離的限制，部署後學習實際上便告終止。此一限制與自然智慧形成對比——後者可透過與環境的互動持續適應。本文將部署時學習（DTL）正式定義為LLM生命週期的第三階段，使LLM智能體能在部署期間從經驗中改進，無需修改模型參數。我們提出CASCADE（部署期間基於案例的持續適應）框架，這是一個通用且具原則性的框架，為LLM智能體配備明確且可持續演進的情節記憶。CASCADE將經驗重複利用形式化為情境式強盜問題，從而實現有原則的探索與利用權衡，並在長期互動中建立無後悔保證。此設計使智能體能夠累積、挑選並精煉任務相關案例，將過往經驗轉化為可付諸行動的知識。在涵蓋醫療診斷、法律分析、程式碼生成、網路搜尋、工具使用及具身互動等16項多樣化任務中，CASCADE的巨集平均成功率較零樣本提示提升20.9%，並持續優於基於梯度與基於記憶的基準方法。透過將部署重新定義為適應性學習過程，本研究為持續改進的人工智慧系統奠定了基礎。

English

Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-based Continual Adaptation during DEployment), a general and principled framework that equips LLM agents with an explicit, evolving episodic memory. CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions. This design allows agents to accumulate, select, and refine task-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improves macro-averaged success rate by 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.

CASCADE：大型語言模型在部署期間的基於案例持續適應

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

摘要

Support