ChatPaper.aiChatPaper

VisCoder:微调大语言模型以生成可执行的Python可视化代码

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

June 4, 2025
作者: Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen
cs.AI

摘要

大型语言模型(LLMs)在处理可视化任务时常常面临挑战,如绘制图表、图解等,这些任务的成功不仅依赖于代码的正确性,还涉及视觉语义的准确性。现有的指令微调数据集缺乏基于执行的监督,且对迭代代码修正的支持有限,导致生成的图表脆弱且不可靠。我们推出了VisCode-200K,这是一个大规模指令微调数据集,专为基于Python的可视化及自我修正而设计。该数据集包含超过20万个示例,来源包括:(1)来自开源库的已验证绘图代码,配以自然语言指令和渲染后的图表;(2)来自Code-Feedback的4.5万轮多回合修正对话,使模型能够利用运行时反馈来修正错误代码。我们在VisCode-200K上微调Qwen2.5-Coder-Instruct,创建了VisCoder,并在PandasPlotBench上对其进行了评估。VisCoder显著超越了强大的开源基线模型,并接近了如GPT-4o-mini等专有模型的性能。此外,我们采用了一种自我调试评估协议来评估迭代修复,展示了反馈驱动学习在生成可执行、视觉准确代码方面的优势。
English
Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.
PDF202June 5, 2025