分步教学：引导智能体逐部分绘制草图

摘要

我们提出了一种逐部件生成矢量草图的方法。该方法在监督微调基础上，采用新型多轮过程奖励强化学习策略训练多模态语言模型智能体。此项研究得以实现的关键在于我们构建的ControlSketch-Part数据集，该数据集通过创新的通用自动标注流程获取，采用结构化多阶段标注方法将矢量草图分割为语义部件并为各部件路径分配标签，从而提供了丰富的部件级草图标注信息。实验结果表明，通过引入结构化部件级数据并使智能体在生成过程中获取视觉反馈，我们的方法能够实现可解释、可控制且支持局部编辑的文生矢量草图生成。

English

We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.