CoDA:協作數據視覺化的能動性系統
CoDA: Agentic Systems for Collaborative Data Visualization
October 3, 2025
作者: Zichen Chen, Jiefeng Chen, Sercan Ö. Arik, Misha Sra, Tomas Pfister, Jinsung Yoon
cs.AI
摘要
深入研究已彻底革新了数据分析领域,然而数据科学家们仍需投入大量时间手动构建可视化图表,这凸显了从自然语言查询实现强大自动化的迫切需求。然而,当前系统在处理包含多个文件及需迭代优化的复杂数据集时仍面临挑战。现有方法,包括简单的单代理或多代理系统,往往过于简化任务,仅聚焦于初始查询解析,而未能有效应对数据复杂性、代码错误或最终可视化质量等问题。本文中,我们将这一挑战重新定义为协作式多代理问题,并引入CoDA——一个采用专门大型语言模型(LLM)代理进行元数据分析、任务规划、代码生成及自我反思的多代理系统。我们形式化了这一流程,展示了以元数据为核心的分析如何绕过令牌限制,以及质量驱动的优化如何确保系统的鲁棒性。广泛的评估表明,CoDA在综合评分上取得了显著提升,较竞争基线最高超出41.5%。本研究表明,可视化自动化的未来不在于孤立的代码生成,而在于集成、协作的代理工作流。
English
Deep research has revolutionized data analysis, yet data scientists still
devote substantial time to manually crafting visualizations, highlighting the
need for robust automation from natural language queries. However, current
systems struggle with complex datasets containing multiple files and iterative
refinement. Existing approaches, including simple single- or multi-agent
systems, often oversimplify the task, focusing on initial query parsing while
failing to robustly manage data complexity, code errors, or final visualization
quality. In this paper, we reframe this challenge as a collaborative
multi-agent problem. We introduce CoDA, a multi-agent system that employs
specialized LLM agents for metadata analysis, task planning, code generation,
and self-reflection. We formalize this pipeline, demonstrating how
metadata-focused analysis bypasses token limits and quality-driven refinement
ensures robustness. Extensive evaluations show CoDA achieves substantial gains
in the overall score, outperforming competitive baselines by up to 41.5%. This
work demonstrates that the future of visualization automation lies not in
isolated code generation but in integrated, collaborative agentic workflows.