VisCoder：針對可執行Python視覺化程式碼生成的LLM微調

摘要

大型語言模型（LLMs）在處理視覺化任務時，如繪製圖表和圖形，往往面臨挑戰，這些任務的成功既取決於代碼的正確性，也依賴於視覺語義。現有的指令微調數據集缺乏基於執行的監督，並且對迭代代碼修正的支持有限，導致生成的圖表脆弱且不可靠。我們提出了VisCode-200K，這是一個大規模的指令微調數據集，專為基於Python的視覺化和自我修正而設計。該數據集包含超過20萬個示例，來源於兩個方面：(1) 來自開源倉庫的經過驗證的繪圖代碼，配對自然語言指令和渲染的圖表；(2) 來自Code-Feedback的45K多輪修正對話，使模型能夠利用運行時反饋來修正錯誤代碼。我們在VisCode-200K上微調Qwen2.5-Coder-Instruct，創建了VisCoder，並在PandasPlotBench上進行評估。VisCoder顯著超越了強大的開源基線模型，並接近了如GPT-4o-mini等專有模型的性能。我們進一步採用自我調試評估協議來評估迭代修復，展示了反饋驅動學習在生成可執行且視覺準確的代碼方面的優勢。

English

Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.

VisCoder：針對可執行Python視覺化程式碼生成的LLM微調

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

摘要

Support