教導智能體逐步繪製圖形部件

摘要

我们开发了一种逐部分生成矢量草图的方法。通过监督微调后，采用新颖的多轮过程奖励强化学习策略，训练基于多模态语言模型的智能体。这一方法的实现得益于我们构建的ControlSketch-Part数据集，该数据集通过创新的通用自动标注流程，将矢量草图分割为语义部件，并采用结构化多阶段标注方式为部件分配路径，从而提供了丰富的部件级草图标注。实验结果表明，结合结构化部件级数据并通过过程可视化反馈，能够实现可解释、可控制且支持局部编辑的文本到矢量草图生成。

English

We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.

教導智能體逐步繪製圖形部件

Teaching an Agent to Sketch One Part at a Time

摘要

Support