VisCoder:針對可執行Python視覺化程式碼生成的LLM微調
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
June 4, 2025
作者: Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen
cs.AI
摘要
大型語言模型(LLMs)在處理視覺化任務時,如繪製圖表和圖形,往往面臨挑戰,這些任務的成功既取決於代碼的正確性,也依賴於視覺語義。現有的指令微調數據集缺乏基於執行的監督,並且對迭代代碼修正的支持有限,導致生成的圖表脆弱且不可靠。我們提出了VisCode-200K,這是一個大規模的指令微調數據集,專為基於Python的視覺化和自我修正而設計。該數據集包含超過20萬個示例,來源於兩個方面:(1) 來自開源倉庫的經過驗證的繪圖代碼,配對自然語言指令和渲染的圖表;(2) 來自Code-Feedback的45K多輪修正對話,使模型能夠利用運行時反饋來修正錯誤代碼。我們在VisCode-200K上微調Qwen2.5-Coder-Instruct,創建了VisCoder,並在PandasPlotBench上進行評估。VisCoder顯著超越了強大的開源基線模型,並接近了如GPT-4o-mini等專有模型的性能。我們進一步採用自我調試評估協議來評估迭代修復,展示了反饋驅動學習在生成可執行且視覺準確的代碼方面的優勢。
English
Large language models (LLMs) often struggle with visualization tasks like
plotting diagrams, charts, where success depends on both code correctness and
visual semantics. Existing instruction-tuning datasets lack execution-grounded
supervision and offer limited support for iterative code correction, resulting
in fragile and unreliable plot generation. We present VisCode-200K, a
large-scale instruction tuning dataset for Python-based visualization and
self-correction. It contains over 200K examples from two sources: (1) validated
plotting code from open-source repositories, paired with natural language
instructions and rendered plots; and (2) 45K multi-turn correction dialogues
from Code-Feedback, enabling models to revise faulty code using runtime
feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create
VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly
outperforms strong open-source baselines and approaches the performance of
proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation
protocol to assess iterative repair, demonstrating the benefits of
feedback-driven learning for executable, visually accurate code generation.