於噪音環境中學習行動：藉由噪音環境增強代理之穩健性

摘要

近期，大型語言模型（LLMs）的進展促進了其作為具備推理、規劃與工具使用能力的互動式智能體廣泛部署。儘管在現有基準測試中表現優異，此類智能體在實際應用場景中常出現顯著性能下降，原因在於真實環境本質上具有隨機性與不完善性。我們認為，此差距源於理想化訓練設定與真實互動動態之間的根本性錯配——當前範式依賴於精心設計的任務指令及穩定可控的環境。為彌合此差距，我們提出NoisyAgent，這是一個明確將環境不完善性納入智能體學習過程的訓練框架。我們識別出實際場景中兩種主要的互動噪聲來源：用戶噪聲（反映用戶互動中的模糊性與變異性）與工具噪聲（反映工具執行過程中的失敗與異常）。通過修改訓練環境中的用戶互動模式與模擬工具執行結果，我們將此類擾動引入訓練流程。為穩定訓練同時鼓勵智能體處理日益嚴峻的不完善性，噪聲僅應用於部分訓練回合，並隨模型適應當前噪聲等級而逐步增加難度。大量實驗表明，我們的方法能持續提升智能體在噪聲與動態環境下的穩健性。分析顯示，在噪聲條件下訓練亦能在理想化基準測試中帶來性能提升，這表明受控的環境噪聲暴露能促進更可泛化的推理與決策行為。我們的研究強調，為智能體訓練與實際部署搭建橋樑時，對互動不完善性的建模至關重要。

English

Recent advances in large language models (LLMs) have facilitated the widespread deployment of LLMs as interactive agents capable of reasoning, planning, and tool use. Despite strong performance on existing benchmarks, such agents often exhibit notable degradation when deployed in real-world settings, where environments are inherently stochastic and imperfect. We argue that this discrepancy arises from a fundamental mismatch between idealized training settings and real-world interaction dynamics, where current paradigms rely on carefully curated task instructions and stable, well-controlled environments. To address this gap, we propose NoisyAgent, an agentic training framework that explicitly incorporates environmental imperfections into the agent learning process. We identify two major sources of interaction noise in real-world scenarios: user noise, which captures ambiguity and variability in user interaction, and tool noise, which reflects failures and anomalies in tool execution. We introduce such perturbations into the training pipeline by modifying user interaction patterns and simulating tool execution results within the training environment. To stabilize training while encouraging agents to handle increasingly challenging imperfections, noise is applied to only a subset of rollouts and progressively increased in difficulty as the model adapts to the current noise level. Extensive experiments demonstrate that our approach consistently improves agent robustness under noisy and dynamic environments. Our analysis reveals that training under noise conditions also yields performance gains on idealized benchmarks, suggesting that controlled exposure to environmental noise promotes more generalizable reasoning and decision-making behaviors. Our findings highlight the importance of modeling interaction imperfections for bridging the gap between agent training and real-world deployment.