MetaClaw：隨意對話——一款在真實環境中元學習與自主演化的智能體

摘要

大型語言模型（LLM）代理在處理複雜任務時的應用日益廣泛，然而已部署的代理往往保持靜態，無法隨使用者需求演進而調整。這導致持續服務需求與能力更新必要性之間產生矛盾——後者旨在適應不斷變化的任務分佈。在如OpenClaw這類橫跨20多個頻道處理多元工作負載的平台上，現有方法要么未經知識提煉直接儲存原始軌跡，要么維持靜態技能庫，要么需要中斷服務進行模型重訓練。我們提出MetaClaw：一個持續元學習框架，能同步演化基礎LLM策略與可複用行為技能庫。該框架採用兩種互補機制：技能驅動的快速適應透過LLM演化器分析失敗軌跡以合成新技能，實現零停機時間的即時效能提升；機會主義策略優化則透過雲端LoRA微調及流程獎勵模型強化學習（RL-PRM）進行梯度更新，由監控系統閒置狀態與行事曆資料的機會主義元學習排程器（OMLS）在使用者非活躍時段觸發。這兩種機制形成良性循環：優化後的策略產生更佳軌跡供技能合成，而更豐富的技能又為策略優化提供更高品質資料。為防止資料污染，版本控制機制會隔離支援集與查詢集資料。基於代理架構設計的MetaClaw無需本地GPU即可擴展至生產級LLM規模。在MetaClaw-Bench與AutoResearchClaw的實驗顯示，技能驅動適應使準確率相對提升最高達32%。完整流程將Kimi-K2.5的準確率從21.4%提升至40.6%，並使綜合魯棒性提高18.3%。程式碼已開源於：https://github.com/aiming-lab/MetaClaw。

English

Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing methods either store raw trajectories without distilling knowledge, maintain static skill libraries, or require disruptive downtime for retraining. We present MetaClaw, a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills. MetaClaw employs two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories via an LLM evolver to synthesize new skills, enabling immediate improvement with zero downtime. Opportunistic policy optimization performs gradient-based updates via cloud LoRA fine-tuning and Reinforcement Learning with a Process Reward Model (RL-PRM). This is triggered during user-inactive windows by the Opportunistic Meta-Learning Scheduler (OMLS), which monitors system inactivity and calendar data. These mechanisms are mutually reinforcing: a refined policy generates better trajectories for skill synthesis, while richer skills provide higher-quality data for policy optimization. To prevent data contamination, a versioning mechanism separates support and query data. Built on a proxy-based architecture, MetaClaw scales to production-size LLMs without local GPUs. Experiments on MetaClaw-Bench and AutoResearchClaw show that skill-driven adaptation improves accuracy by up to 32% relative. The full pipeline advances Kimi-K2.5 accuracy from 21.4% to 40.6% and increases composite robustness by 18.3%. Code is available at https://github.com/aiming-lab/MetaClaw.

MetaClaw：隨意對話——一款在真實環境中元學習與自主演化的智能體

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

摘要

Support