Role-Agent: 透過雙角色演化自舉LLM代理

摘要

儘管大型語言模型（LLM）代理在複雜任務中展現出強大的效能，但其學習過程往往受限於低效的互動回饋與靜態的訓練環境，從而阻礙了更廣泛的泛化能力。為了解決這些限制，本文提出了Role-Agent框架，該框架利用單一大型語言模型同時扮演代理與環境的角色，從而實現自我啟動的共同演化。Role-Agent由兩個協同組件構成：世界中的代理（WIA）與代理中的世界（AIW）。在WIA中，大型語言模型作為代理，在每次行動後預測未來狀態，並將預測狀態與實際狀態之間的一致性作為過程獎勵，以激發具環境感知能力的推理。在AIW中，大型語言模型從失敗軌跡中分析失敗模式，並檢索具有相似失敗模式的任務，從而重塑訓練資料分佈以進行針對性練習。多個基準實驗結果顯示，Role-Agent能持續提升效能，相較於強基線模型平均提升超過4%。

English

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent, black{a framework} that harnesses a single LLM to function concurrently as both the agent and the environment, enabling a bootstrapped co-evolution. Role-Agent comprises two synergistic components: World-In-Agent (WIA) and Agent-In-World (AIW). In WIA, the LLM acts as the agent and predicts future states after each action; the alignment between predicted and actual states is then used as a process reward, encouraging environment-aware reasoning. In AIW, the LLM analyzes failure modes from failed trajectories and retrieves tasks with similar failure patterns, thereby reshaping the training data distribution for targeted practice. Experiments on multiple benchmarks show that Role-Agent consistently improves performance, yielding an average gain of over 4\% over strong baselines.