Role-Agent: 이중 역할 진화를 통한 LLM 에이전트 부트스트래핑

초록

대규모 언어 모델(LLM) 기반 에이전트는 복잡한 작업에서 뛰어난 성능을 보여주지만, 비효율적인 상호작용 피드백과 정적인 훈련 환경으로 인해 학습이 제한되어 광범위한 일반화에 장애가 된다. 이러한 한계를 해결하기 위해, 본 논문은 단일 LLM이 에이전트와 환경 역할을 동시에 수행하도록 하여 부트스트래핑된 공진화를 가능하게 하는 프레임워크인 Role-Agent를 소개한다. Role-Agent는 두 가지 상호 보완적 구성 요소, 즉 세계-내-에이전트(WIA)와 에이전트-내-세계(AIW)로 구성된다. WIA에서 LLM은 에이전트로 작동하며 각 행동 이후 미래 상태를 예측하고, 예측 상태와 실제 상태 간의 정렬을 과정 보상으로 활용하여 환경 인식 추론을 장려한다. AIW에서는 LLM이 실패한 궤적으로부터 실패 모드를 분석하고 유사한 실패 패턴을 가진 작업을 검색하여 훈련 데이터 분포를 표적 연습에 맞게 재구성한다. 여러 벤치마크에 대한 실험 결과, Role-Agent는 일관되게 성능을 향상시켜 강력한 기준선 대비 평균 4% 이상의 개선을 보여준다.

English

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent, black{a framework} that harnesses a single LLM to function concurrently as both the agent and the environment, enabling a bootstrapped co-evolution. Role-Agent comprises two synergistic components: World-In-Agent (WIA) and Agent-In-World (AIW). In WIA, the LLM acts as the agent and predicts future states after each action; the alignment between predicted and actual states is then used as a process reward, encouraging environment-aware reasoning. In AIW, the LLM analyzes failure modes from failed trajectories and retrieves tasks with similar failure patterns, thereby reshaping the training data distribution for targeted practice. Experiments on multiple benchmarks show that Role-Agent consistently improves performance, yielding an average gain of over 4\% over strong baselines.