ノイズ下での行動学習：ノイズ環境によるエージェントのロバスト性向上

要旨

近年、大規模言語モデル（LLM）の進展により、LLMが推論、計画、ツール使用が可能な対話型エージェントとして広く展開されるようになった。既存のベンチマークでは高い性能を示す一方で、これらのエージェントは現実世界の環境（本質的に確率的で不完全）に展開された際に、顕著な性能低下を示すことが多い。この乖離は、理想化された訓練設定と現実世界の相互作用ダイナミクスの間の根本的なミスマッチに起因すると我々は考える。現在のパラダイムは、厳選されたタスク指示と安定した制御環境に依存しているためである。このギャップを埋めるため、我々はNoisyAgentを提案する。これは、エージェントの学習プロセスに環境の不完全性を明示的に組み込むエージェント訓練フレームワークである。現実世界のシナリオにおける相互作用ノイズの主な原因として、ユーザーインタラクションの曖昧さや変動性を捉えるユーザーノイズと、ツール実行時の障害や異常を反映するツールノイズの2つを特定した。訓練パイプラインには、訓練環境内でユーザーインタラクションパターンを変更し、ツール実行結果をシミュレートすることで、これらの摂動を導入する。訓練を安定化させつつ、エージェントがより困難な不完全性に対処するよう促すため、ノイズは一部のロールアウトにのみ適用し、モデルが現在のノイズレベルに適応するにつれて難易度を段階的に上げる。広範な実験により、本手法がノイズの多い動的環境下でエージェントのロバスト性を一貫して向上させることが示された。分析の結果、ノイズ条件下での訓練は理想化されたベンチマークにおいても性能向上をもたらし、環境ノイズへの制御された曝露がより汎化可能な推論・意思決定行動を促進することが示唆された。この発見は、エージェント訓練と現実世界展開のギャップを埋めるために、相互作用の不完全性をモデル化することの重要性を強調するものである。

English

Recent advances in large language models (LLMs) have facilitated the widespread deployment of LLMs as interactive agents capable of reasoning, planning, and tool use. Despite strong performance on existing benchmarks, such agents often exhibit notable degradation when deployed in real-world settings, where environments are inherently stochastic and imperfect. We argue that this discrepancy arises from a fundamental mismatch between idealized training settings and real-world interaction dynamics, where current paradigms rely on carefully curated task instructions and stable, well-controlled environments. To address this gap, we propose NoisyAgent, an agentic training framework that explicitly incorporates environmental imperfections into the agent learning process. We identify two major sources of interaction noise in real-world scenarios: user noise, which captures ambiguity and variability in user interaction, and tool noise, which reflects failures and anomalies in tool execution. We introduce such perturbations into the training pipeline by modifying user interaction patterns and simulating tool execution results within the training environment. To stabilize training while encouraging agents to handle increasingly challenging imperfections, noise is applied to only a subset of rollouts and progressively increased in difficulty as the model adapts to the current noise level. Extensive experiments demonstrate that our approach consistently improves agent robustness under noisy and dynamic environments. Our analysis reveals that training under noise conditions also yields performance gains on idealized benchmarks, suggesting that controlled exposure to environmental noise promotes more generalizable reasoning and decision-making behaviors. Our findings highlight the importance of modeling interaction imperfections for bridging the gap between agent training and real-world deployment.