对齐文本、代码与视觉:基于多目标强化学习的文本到可视化生成框架
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
January 8, 2026
作者: Mizanur Rahman, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Shafiq Joty, Enamul Hoque
cs.AI
摘要
文本到可视化(Text2Vis)系统能够将针对表格数据的自然语言查询转化为简洁答案及可执行的可视化图表。尽管闭源大语言模型能生成功能性代码,但其生成的图表常存在语义对齐不足和清晰度欠佳的问题,这些质量缺陷仅能在执行后评估。开源模型表现更为逊色,常产生无法执行或视觉效果低劣的输出。虽然监督微调可提升代码可执行性,但由于传统监督微调损失函数无法捕捉执行后反馈,该方法难以全面提升可视化质量。为弥补这一缺陷,我们提出RL-Text2Vis——首个基于强化学习的Text2Vis生成框架。该方法基于群组相对策略优化(GRPO)构建,通过新型多目标奖励机制,利用执行后反馈联合优化文本准确性、代码有效性和可视化质量。通过训练Qwen2.5模型(7B和14B参数版本),RL-Text2Vis在Text2Vis基准测试中较GPT-4o实现图表质量22%的相对提升,并将代码执行成功率从零样本基线的78%提升至97%。我们的模型显著超越强零样本和监督基线,并在VIS-Eval和NVBench等域外数据集上展现出强大泛化能力。这些成果证实GRPO是可视化生成中结构化多模态推理的有效策略。代码已发布于https://github.com/vis-nlp/RL-Text2Vis。
English
Text-to-Visualization (Text2Vis) systems translate natural language queries over tabular data into concise answers and executable visualizations. While closed-source LLMs generate functional code, the resulting charts often lack semantic alignment and clarity, qualities that can only be assessed post-execution. Open-source models struggle even more, frequently producing non-executable or visually poor outputs. Although supervised fine-tuning can improve code executability, it fails to enhance overall visualization quality, as traditional SFT loss cannot capture post-execution feedback. To address this gap, we propose RL-Text2Vis, the first reinforcement learning framework for Text2Vis generation. Built on Group Relative Policy Optimization (GRPO), our method uses a novel multi-objective reward that jointly optimizes textual accuracy, code validity, and visualization quality using post-execution feedback. By training Qwen2.5 models (7B and 14B), RL-Text2Vis achieves a 22% relative improvement in chart quality over GPT-4o on the Text2Vis benchmark and boosts code execution success from 78% to 97% relative to its zero-shot baseline. Our models significantly outperform strong zero-shot and supervised baselines and also demonstrate robust generalization to out-of-domain datasets like VIS-Eval and NVBench. These results establish GRPO as an effective strategy for structured, multimodal reasoning in visualization generation. We release our code at https://github.com/vis-nlp/RL-Text2Vis.