在噪声中学习行动：通过噪声环境增强智能体鲁棒性

摘要

近期大语言模型（LLMs）的进展推动了其作为具备推理、规划及工具使用能力的交互式智能体的广泛应用。尽管现有基准测试表现优异，但在实际部署中（环境天然具有随机性与不完美性），这类智能体常表现出显著性能衰减。我们认为，这种差异源于理想化训练设置与现实交互动态之间的根本性错位——当前范式依赖于精心设计的任务指令和稳定可控的环境。为解决该问题，我们提出NoisyAgent智能体训练框架，将环境不完美性显式融入智能体学习过程。我们识别出真实场景中的两类主要交互噪声源：用户噪声（反映用户交互的模糊性与变异性）与工具噪声（反映工具执行过程中的故障与异常）。通过在训练环境中修改用户交互模式并模拟工具执行结果，我们将此类扰动引入训练流程。为在稳定训练的同时鼓励智能体应对逐步升级的不完美性，噪声仅作用于部分轨迹样本，并随模型适应当前噪声水平而渐进式提升难度。大量实验表明，该方法在噪声动态环境下持续提升智能体鲁棒性。分析揭示，噪声条件下的训练还能提升理想化基准测试的性能表现，这表明受控的噪声暴露能促进更通用的推理与决策行为。我们的研究强调了建模交互不完美性对弥合智能体训练与现实部署之间差距的重要意义。

English

Recent advances in large language models (LLMs) have facilitated the widespread deployment of LLMs as interactive agents capable of reasoning, planning, and tool use. Despite strong performance on existing benchmarks, such agents often exhibit notable degradation when deployed in real-world settings, where environments are inherently stochastic and imperfect. We argue that this discrepancy arises from a fundamental mismatch between idealized training settings and real-world interaction dynamics, where current paradigms rely on carefully curated task instructions and stable, well-controlled environments. To address this gap, we propose NoisyAgent, an agentic training framework that explicitly incorporates environmental imperfections into the agent learning process. We identify two major sources of interaction noise in real-world scenarios: user noise, which captures ambiguity and variability in user interaction, and tool noise, which reflects failures and anomalies in tool execution. We introduce such perturbations into the training pipeline by modifying user interaction patterns and simulating tool execution results within the training environment. To stabilize training while encouraging agents to handle increasingly challenging imperfections, noise is applied to only a subset of rollouts and progressively increased in difficulty as the model adapts to the current noise level. Extensive experiments demonstrate that our approach consistently improves agent robustness under noisy and dynamic environments. Our analysis reveals that training under noise conditions also yields performance gains on idealized benchmarks, suggesting that controlled exposure to environmental noise promotes more generalizable reasoning and decision-making behaviors. Our findings highlight the importance of modeling interaction imperfections for bridging the gap between agent training and real-world deployment.