DV-World:现实场景下数据可视化智能体基准测试平台
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
April 28, 2026
作者: Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu
cs.AI
摘要
现实世界数据可视化(DV)需要具备原生环境适配能力、跨平台演进能力以及主动意图对齐能力。然而现有基准测试常受限于代码沙箱隔离、单一语言的仅创建型任务,以及完美意图的强假设。为弥补这些不足,我们推出DV-World基准测试集,包含260个任务,旨在全面评估数据可视化智能体在真实业务场景中的表现。该基准涵盖三大领域:DV-Sheet专注于原生电子表格操作,包括图表与仪表板创建及诊断修复;DV-Evolution要求在不同编程范式下适配和重构参考可视化作品以匹配新数据;DV-Interact通过模拟真实世界模糊需求的用户仿真器,测试智能体的主动意图对齐能力。我们提出的混合评估框架融合了数值精度的表格值对齐方法,以及基于多模态大语言模型的语义视觉评估体系。实验表明,当前最先进模型整体性能不足50%,暴露出处理现实数据可视化复杂挑战时的关键缺陷。DV-World为引导技术发展迈向企业工作流所需的综合专业能力提供了真实测试平台。项目数据与代码已开源:https://github.com/DA-Open/DV-World{项目页面}。
English
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a benchmark of 260 tasks designed to evaluate DV agents across real-world professional lifecycles. DV-World spans three domains: DV-Sheet for native spreadsheet manipulation including chart and dashboard creation as well as diagnostic repair; DV-Evolution for adapting and restructuring reference visual artifacts to fit new data across diverse programming paradigms and DV-Interact for proactive intent alignment with a user simulator that mimics real-world ambiguous requirements. Our hybrid evaluation framework integrates Table-value Alignment for numerical precision and MLLM-as-a-Judge with rubrics for semantic-visual assessment. Experiments reveal that state-of-the-art models achieve less than 50% overall performance, exposing critical deficits in handling the complex challenges of real-world data visualization. DV-World provides a realistic testbed to steer development toward the versatile expertise required in enterprise workflows. Our data and code are available at https://github.com/DA-Open/DV-World{this project page}.