잡음 환경에서의 행동 학습: 잡음 환경을 통한 에이전트 강건성 향상

초록

최근 대규모 언어 모델(LLM)의 발전으로 인해 LLM이 추론, 계획, 도구 사용이 가능한 대화형 에이전트로 광범위하게 배포되고 있다. 기존 벤치마크에서 뛰어난 성능을 보임에도 불구하고, 이러한 에이전트는 환경이 본질적으로 확률적이고 불완전한 실제 환경에 배치될 때 종종 현저한 성능 저하를 보인다. 우리는 이러한 차이가 이상적인 훈련 환경과 실제 상호작용 역학 간의 근본적인 불일치에서 비롯되며, 현재 패러다임은 신중하게 선별된 작업 지시와 안정적이고 잘 통제된 환경에 의존하고 있기 때문이라고 주장한다. 이러한 간극을 해소하기 위해, 우리는 환경적 불완전성을 에이전트 학습 과정에 명시적으로 통합하는 에이전트 훈련 프레임워크인 NoisyAgent를 제안한다. 실제 시나리오에서 발생하는 상호작용 잡음의 두 가지 주요 원천을 식별한다: 사용자 상호작용의 모호성과 가변성을 포착하는 사용자 잡음, 그리고 도구 실행의 실패와 이상 현상을 반영하는 도구 잡음이 그것이다. 우리는 훈련 환경 내에서 사용자 상호작용 패턴을 수정하고 도구 실행 결과를 시뮬레이션함으로써 이러한 섭동을 훈련 파이프라인에 도입한다. 훈련을 안정화하면서 에이전트가 점점 더 어려운 불완전성을 처리하도록 장려하기 위해, 잡음은 일부 롤아웃에만 적용되며 모델이 현재 잡음 수준에 적응함에 따라 점진적으로 난이도가 증가한다. 광범위한 실험을 통해 우리의 접근 방식이 잡음이 있고 역동적인 환경에서 에이전트의 강건성을 지속적으로 향상시킴을 입증한다. 분석 결과, 잡음 조건에서 훈련하는 것이 이상적인 벤치마크에서도 성능 향상을 가져오며, 이는 통제된 환경 잡음 노출이 보다 일반화 가능한 추론 및 의사 결정 행동을 촉진함을 시사한다. 본 연구 결과는 에이전트 훈련과 실제 배포 간의 간극을 해소하기 위해 상호작용 불완전성을 모델링하는 것의 중요성을 강조한다.

English

Recent advances in large language models (LLMs) have facilitated the widespread deployment of LLMs as interactive agents capable of reasoning, planning, and tool use. Despite strong performance on existing benchmarks, such agents often exhibit notable degradation when deployed in real-world settings, where environments are inherently stochastic and imperfect. We argue that this discrepancy arises from a fundamental mismatch between idealized training settings and real-world interaction dynamics, where current paradigms rely on carefully curated task instructions and stable, well-controlled environments. To address this gap, we propose NoisyAgent, an agentic training framework that explicitly incorporates environmental imperfections into the agent learning process. We identify two major sources of interaction noise in real-world scenarios: user noise, which captures ambiguity and variability in user interaction, and tool noise, which reflects failures and anomalies in tool execution. We introduce such perturbations into the training pipeline by modifying user interaction patterns and simulating tool execution results within the training environment. To stabilize training while encouraging agents to handle increasingly challenging imperfections, noise is applied to only a subset of rollouts and progressively increased in difficulty as the model adapts to the current noise level. Extensive experiments demonstrate that our approach consistently improves agent robustness under noisy and dynamic environments. Our analysis reveals that training under noise conditions also yields performance gains on idealized benchmarks, suggesting that controlled exposure to environmental noise promotes more generalizable reasoning and decision-making behaviors. Our findings highlight the importance of modeling interaction imperfections for bridging the gap between agent training and real-world deployment.