VisCoder2:构建多语言可视化编程智能体
VisCoder2: Building Multi-Language Visualization Coding Agents
October 24, 2025
作者: Yuansheng Ni, Songcheng Cai, Xiangchao Chen, Jiarong Liang, Zhiheng Lyu, Jiaqi Deng, Kai Zou, Ping Nie, Fei Yuan, Xiang Yue, Wenhu Chen
cs.AI
摘要
近期,大型语言模型(LLMs)催生了能够生成、执行和修正可视化代码的编程智能体。然而,现有模型因语言覆盖范围有限、执行可靠性不足以及缺乏迭代修正机制,在实际工作流程中往往表现不佳。由于现有数据集和基准测试过于局限,过度强调单轮生成和单一语言任务,相关研究进展受到制约。为解决这些挑战,我们推出三项互补资源以推进可视化编程智能体发展:VisCode-Multi-679K作为大规模监督数据集,包含67.9万个经过验证的可执行可视化样本,涵盖12种编程语言的多轮修正对话;VisPlotBench作为系统性评估基准,提供可执行任务、渲染输出以及支持初始生成与多轮自调试的测试协议;最后,我们提出基于VisCode-Multi-679K训练的多语言可视化模型系列VisCoder2。实验表明,VisCoder2显著超越强开源基线模型,性能接近GPT-4.1等专有模型,通过迭代自调试进一步实现效能提升——在320亿参数规模下达到82.4%的整体执行通过率,尤其在符号化或依赖编译器的编程语言中表现突出。
English
Large language models (LLMs) have recently enabled coding agents capable of
generating, executing, and revising visualization code. However, existing
models often fail in practical workflows due to limited language coverage,
unreliable execution, and lack of iterative correction mechanisms. Progress has
been constrained by narrow datasets and benchmarks that emphasize single-round
generation and single-language tasks. To address these challenges, we introduce
three complementary resources for advancing visualization coding agents.
VisCode-Multi-679K is a large-scale, supervised dataset containing 679K
validated and executable visualization samples with multi-turn correction
dialogues across 12 programming languages. VisPlotBench is a benchmark for
systematic evaluation, featuring executable tasks, rendered outputs, and
protocols for both initial generation and multi-round self-debug. Finally, we
present VisCoder2, a family of multi-language visualization models trained on
VisCode-Multi-679K. Experiments show that VisCoder2 significantly outperforms
strong open-source baselines and approaches the performance of proprietary
models like GPT-4.1, with further gains from iterative self-debug, reaching
82.4% overall execution pass rate at the 32B scale, particularly in symbolic or
compiler-dependent languages.