VisCoder：微调大语言模型以生成可执行的Python可视化代码

摘要

大型语言模型（LLMs）在处理可视化任务时常常面临挑战，如绘制图表、图解等，这些任务的成功不仅依赖于代码的正确性，还涉及视觉语义的准确性。现有的指令微调数据集缺乏基于执行的监督，且对迭代代码修正的支持有限，导致生成的图表脆弱且不可靠。我们推出了VisCode-200K，这是一个大规模指令微调数据集，专为基于Python的可视化及自我修正而设计。该数据集包含超过20万个示例，来源包括：（1）来自开源库的已验证绘图代码，配以自然语言指令和渲染后的图表；（2）来自Code-Feedback的4.5万轮多回合修正对话，使模型能够利用运行时反馈来修正错误代码。我们在VisCode-200K上微调Qwen2.5-Coder-Instruct，创建了VisCoder，并在PandasPlotBench上对其进行了评估。VisCoder显著超越了强大的开源基线模型，并接近了如GPT-4o-mini等专有模型的性能。此外，我们采用了一种自我调试评估协议来评估迭代修复，展示了反馈驱动学习在生成可执行、视觉准确代码方面的优势。

English

Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.

VisCoder：微调大语言模型以生成可执行的Python可视化代码

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

摘要

Support