网络智能体中差异化人类交互的建模
Modeling Distinct Human Interaction in Web Agents
February 19, 2026
作者: Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo, Venu Arvind Arangarajan, Tianyue Ou, Frank Xu, Shuyan Zhou, Graham Neubig, Jeffrey P. Bigham
cs.AI
摘要
尽管自主网络代理技术发展迅速,但在任务执行过程中,人类参与对于偏好设定和行为纠偏仍不可或缺。然而现有代理系统缺乏对人工干预时机与动因的理论认知,往往在跨越关键决策点时仍持续自主运行,或提出不必要的确认请求。本研究提出建立人类干预模型以支持协同式网络任务执行的新任务。我们收集了包含4200余项交错式人机操作的400条真实用户网络导航轨迹数据集CowCorpus,并识别出用户与代理交互的四种典型模式——放手式监督、介入式监察、协同任务执行及完全接管。基于这些发现,我们训练语言模型根据用户交互风格预测其干预倾向,使干预预测准确率较基础语言模型提升61.4-63.4%。最终将这些具备干预感知能力的模型部署至实时网络导航代理,通过用户研究发现代理可用性评分提升26.5%。研究表明,对人类干预进行结构化建模能有效增强代理的适应性与协同能力。
English
Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical decision points or requesting unnecessary confirmation. In this work, we introduce the task of modeling human intervention to support collaborative web task execution. We collect CowCorpus, a dataset of 400 real-user web navigation trajectories containing over 4,200 interleaved human and agent actions. We identify four distinct patterns of user interaction with agents -- hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover. Leveraging these insights, we train language models (LMs) to anticipate when users are likely to intervene based on their interaction styles, yielding a 61.4-63.4% improvement in intervention prediction accuracy over base LMs. Finally, we deploy these intervention-aware models in live web navigation agents and evaluate them in a user study, finding a 26.5% increase in user-rated agent usefulness. Together, our results show structured modeling of human intervention leads to more adaptive, collaborative agents.