ChatPaper.aiChatPaper

DV-World:在真實世界場景中對資料視覺化代理進行基準測試

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

April 28, 2026
作者: Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu
cs.AI

摘要

現實世界的資料視覺化需要具備原生環境基礎、跨平台演進能力及主動意圖對齊。然而現有基準測試常受限於程式碼沙箱隔離、僅支援單一語言的創作型任務,以及對完美意圖的假設。為彌合這些差距,我們推出DV-World基準測試套件,包含260項任務,旨在全面評估資料視覺化代理在真實專業工作流程中的表現。DV-World涵蓋三大領域:DV-Sheet專注原生試算表操作,包含圖表與儀表板創建及診斷修復;DV-Evolution針對跨程式典範的參考視覺作品改編與重構;DV-Interact則透過模擬真實世界模糊需求的使用者模擬器,實現主動意圖對齊。我們的混合評估框架整合了確保數值精確度的表格值對齊技術,以及採用評分規程進行語義視覺評估的多模態大模型評判機制。實驗顯示,現有頂尖模型的整體表現不足50%,暴露其在處理真實世界資料視覺化複雜挑戰時的關鍵缺陷。DV-World提供了一個貼近現實的測試平台,可引導技術發展朝向企業工作流程所需的多元專業能力邁進。本專案資料與程式碼已公開於:https://github.com/DA-Open/DV-World{專案頁面}。
English
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a benchmark of 260 tasks designed to evaluate DV agents across real-world professional lifecycles. DV-World spans three domains: DV-Sheet for native spreadsheet manipulation including chart and dashboard creation as well as diagnostic repair; DV-Evolution for adapting and restructuring reference visual artifacts to fit new data across diverse programming paradigms and DV-Interact for proactive intent alignment with a user simulator that mimics real-world ambiguous requirements. Our hybrid evaluation framework integrates Table-value Alignment for numerical precision and MLLM-as-a-Judge with rubrics for semantic-visual assessment. Experiments reveal that state-of-the-art models achieve less than 50% overall performance, exposing critical deficits in handling the complex challenges of real-world data visualization. DV-World provides a realistic testbed to steer development toward the versatile expertise required in enterprise workflows. Our data and code are available at https://github.com/DA-Open/DV-World{this project page}.
PDF371April 30, 2026