终端代理足以实现企业自动化

摘要

当前，构建能够与数字平台交互以自主执行关键企业任务的智能体日益受到关注。已有研究探索了基于模型上下文协议（MCP）等抽象框架的工具增强型智能体，以及通过图形界面操作的网页智能体。然而，考虑到其成本与运维开销，此类复杂智能体系统是否必要仍存疑问。我们认为，仅配备终端和文件系统的编程智能体通过直接调用平台API，能更高效地解决众多企业任务。通过对多样化真实场景系统的评估，我们发现这类底层终端智能体的表现与更复杂的智能体架构相当甚至更优。研究结果表明，结合强大基础模型的简单程序化接口已足以实现实用的企业自动化。

English

There has been growing interest in building agents that can interact with digital platforms to execute meaningful enterprise tasks autonomously. Among the approaches explored are tool-augmented agents built on abstractions such as Model Context Protocol (MCP) and web agents that operate through graphical interfaces. Yet, it remains unclear whether such complex agentic systems are necessary given their cost and operational overhead. We argue that a coding agent equipped only with a terminal and a filesystem can solve many enterprise tasks more effectively by interacting directly with platform APIs. We evaluate this hypothesis across diverse real-world systems and show that these low-level terminal agents match or outperform more complex agent architectures. Our findings suggest that simple programmatic interfaces, combined with strong foundation models, are sufficient for practical enterprise automation.