CoDA：面向协作数据可视化的智能代理系统

摘要

深度研究已彻底革新了数据分析领域，然而数据科学家们仍需投入大量时间手动构建可视化图表，这凸显了从自然语言查询实现强大自动化的迫切需求。然而，现有系统在处理包含多个文件及需迭代优化的复杂数据集时仍面临挑战。现有方法，包括简单的单代理或多代理系统，往往过于简化任务，仅关注初始查询解析，而未能有效应对数据复杂性、代码错误或最终可视化质量等问题。本文中，我们将这一挑战重新定义为协作式多代理问题，并引入CoDA——一个采用专门LLM代理进行元数据分析、任务规划、代码生成及自我反思的多代理系统。我们形式化了这一流程，展示了以元数据为核心的分析如何突破令牌限制，以及质量驱动的优化如何确保系统的鲁棒性。大量评估表明，CoDA在综合评分上取得显著提升，较竞争基线最高超出41.5%。本研究表明，可视化自动化的未来不在于孤立的代码生成，而在于集成、协作的代理工作流。

English

Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations, highlighting the need for robust automation from natural language queries. However, current systems struggle with complex datasets containing multiple files and iterative refinement. Existing approaches, including simple single- or multi-agent systems, often oversimplify the task, focusing on initial query parsing while failing to robustly manage data complexity, code errors, or final visualization quality. In this paper, we reframe this challenge as a collaborative multi-agent problem. We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection. We formalize this pipeline, demonstrating how metadata-focused analysis bypasses token limits and quality-driven refinement ensures robustness. Extensive evaluations show CoDA achieves substantial gains in the overall score, outperforming competitive baselines by up to 41.5%. This work demonstrates that the future of visualization automation lies not in isolated code generation but in integrated, collaborative agentic workflows.