ChatPaper.aiChatPaper

數據智慧體綜覽:新興典範抑或言過其實?

A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

October 27, 2025
作者: Yizhang Zhu, Liangwei Wang, Chenyu Yang, Xiaotian Lin, Boyan Li, Wei Zhou, Xinyu Liu, Zhangyang Peng, Tianqi Luo, Yu Li, Chengliang Chai, Chong Chen, Shimin Di, Ju Fan, Ji Sun, Nan Tang, Fugee Tsung, Jiannan Wang, Chenglin Wu, Yanwei Xu, Shaolei Zhang, Yong Zhang, Xuanhe Zhou, Guoliang Li, Yuyu Luo
cs.AI

摘要

大型語言模型(LLM)的快速發展催生了數據代理的興起——這類自主系統旨在協調「數據+AI」生態系統以處理複雜的數據相關任務。然而,「數據代理」一詞目前存在術語定義模糊與應用標準不統一的問題,常將簡單的查詢響應器與複雜的自主架構混為一談。這種術語模糊性導致用戶期望錯位、責任歸屬挑戰以及行業發展障礙。受SAE J3016駕駛自動化標準啟發,本綜述首次提出針對數據代理的系統化分層分類法,包含六個級別,用以描繪從人工操作(L0)到生成式全自主數據代理(L5)的漸進式自主性轉變,從而明確能力邊界與責任分配。透過此框架,我們按自主性遞增的順序對現有研究進行結構化梳理,涵蓋專注於數據管理、準備與分析的專項數據代理,以及追求更高自主性的多功能綜合系統新興成果。我們進一步分析推進數據代理演進的關鍵躍遷與技術缺口,特別是當前從L2到L3的過渡階段——數據代理正從流程執行邁向自主協調。最後提出前瞻性發展路線圖,展望主動式生成數據代理的到來。
English
The rapid advancement of large language models (LLMs) has spurred the emergence of data agents--autonomous systems designed to orchestrate Data + AI ecosystems for tackling complex data-related tasks. However, the term "data agent" currently suffers from terminological ambiguity and inconsistent adoption, conflating simple query responders with sophisticated autonomous architectures. This terminological ambiguity fosters mismatched user expectations, accountability challenges, and barriers to industry growth. Inspired by the SAE J3016 standard for driving automation, this survey introduces the first systematic hierarchical taxonomy for data agents, comprising six levels that delineate and trace progressive shifts in autonomy, from manual operations (L0) to a vision of generative, fully autonomous data agents (L5), thereby clarifying capability boundaries and responsibility allocation. Through this lens, we offer a structured review of existing research arranged by increasing autonomy, encompassing specialized data agents for data management, preparation, and analysis, alongside emerging efforts toward versatile, comprehensive systems with enhanced autonomy. We further analyze critical evolutionary leaps and technical gaps for advancing data agents, especially the ongoing L2-to-L3 transition, where data agents evolve from procedural execution to autonomous orchestration. Finally, we conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.
PDF651December 31, 2025