ChatPaper.aiChatPaper

VisCoder:針對可執行Python視覺化程式碼生成的LLM微調

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

June 4, 2025
作者: Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen
cs.AI

摘要

大型語言模型(LLMs)在處理視覺化任務時,如繪製圖表和圖形,往往面臨挑戰,這些任務的成功既取決於代碼的正確性,也依賴於視覺語義。現有的指令微調數據集缺乏基於執行的監督,並且對迭代代碼修正的支持有限,導致生成的圖表脆弱且不可靠。我們提出了VisCode-200K,這是一個大規模的指令微調數據集,專為基於Python的視覺化和自我修正而設計。該數據集包含超過20萬個示例,來源於兩個方面:(1) 來自開源倉庫的經過驗證的繪圖代碼,配對自然語言指令和渲染的圖表;(2) 來自Code-Feedback的45K多輪修正對話,使模型能夠利用運行時反饋來修正錯誤代碼。我們在VisCode-200K上微調Qwen2.5-Coder-Instruct,創建了VisCoder,並在PandasPlotBench上進行評估。VisCoder顯著超越了強大的開源基線模型,並接近了如GPT-4o-mini等專有模型的性能。我們進一步採用自我調試評估協議來評估迭代修復,展示了反饋驅動學習在生成可執行且視覺準確的代碼方面的優勢。
English
Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.
PDF212June 5, 2025