ChatPaper.aiChatPaper

数据智能体调研:新兴范式还是过度炒作?

A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

October 27, 2025
作者: Yizhang Zhu, Liangwei Wang, Chenyu Yang, Xiaotian Lin, Boyan Li, Wei Zhou, Xinyu Liu, Zhangyang Peng, Tianqi Luo, Yu Li, Chengliang Chai, Chong Chen, Shimin Di, Ju Fan, Ji Sun, Nan Tang, Fugee Tsung, Jiannan Wang, Chenglin Wu, Yanwei Xu, Shaolei Zhang, Yong Zhang, Xuanhe Zhou, Guoliang Li, Yuyu Luo
cs.AI

摘要

大型语言模型(LLMs)的飞速发展催生了数据智能体——一种旨在协调"数据+AI"生态系统以处理复杂数据任务的自主系统。然而,"数据智能体"这一术语目前存在定义模糊和应用不一致的问题,常将简单的查询应答系统与复杂的自主架构混为一谈。这种术语模糊性易导致用户期望错位、责任归属难题,并阻碍行业发展。受汽车自动驾驶领域SAE J3016标准的启发,本综述首次提出数据智能体的系统化分级框架,包含从人工操作(L0)到生成式全自主数据智能体(L5)的六个层级,清晰界定了能力边界与责任分配。基于此框架,我们按自主程度递增的顺序对现有研究进行结构化梳理,涵盖专注于数据管理、准备和分析的专用数据智能体,以及向多功能综合系统演进的前沿探索。进一步地,我们分析了推进数据智能体发展的关键跃迁点与技术鸿沟,特别是当前正在发生的从L2到L3的转型——数据智能体正从流程执行迈向自主协调。最后,我们提出前瞻性发展路线图,展望具有主动性与生成能力的数据智能体的到来。
English
The rapid advancement of large language models (LLMs) has spurred the emergence of data agents--autonomous systems designed to orchestrate Data + AI ecosystems for tackling complex data-related tasks. However, the term "data agent" currently suffers from terminological ambiguity and inconsistent adoption, conflating simple query responders with sophisticated autonomous architectures. This terminological ambiguity fosters mismatched user expectations, accountability challenges, and barriers to industry growth. Inspired by the SAE J3016 standard for driving automation, this survey introduces the first systematic hierarchical taxonomy for data agents, comprising six levels that delineate and trace progressive shifts in autonomy, from manual operations (L0) to a vision of generative, fully autonomous data agents (L5), thereby clarifying capability boundaries and responsibility allocation. Through this lens, we offer a structured review of existing research arranged by increasing autonomy, encompassing specialized data agents for data management, preparation, and analysis, alongside emerging efforts toward versatile, comprehensive systems with enhanced autonomy. We further analyze critical evolutionary leaps and technical gaps for advancing data agents, especially the ongoing L2-to-L3 transition, where data agents evolve from procedural execution to autonomous orchestration. Finally, we conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.
PDF651December 31, 2025