에이전트에게 한 번에 한 부분씩 스케치하는 방법 가르치기

초록

우리는 벡터 스케치를 한 번에 한 부분씩 생성하는 방법을 개발한다. 이를 위해 감독 미세 조정 후 새로운 다중 턴 과정-보상 강화 학습을 사용하여 다중 모달 언어 모델 기반 에이전트를 훈련한다. 우리의 접근법은 ControlSketch-Part라고 명명한 새로운 데이터셋을 통해 가능해졌으며, 이 데이터셋은 구조화된 다단계 라벨링 과정으로 벡터 스케치를 의미론적 부분으로 분할하고 경로를 부분에 할당하는 새로운 일반적 자동 주석 파이프라인을 통해 얻은 풍부한 부분 수준 주석을 포함한다. 우리의 결과는 구조화된 부분 수준 데이터를 통합하고 과정 중 시각적 피드백을 에이전트에 제공하는 것이 해석 가능하고 제어 가능하며 지역적으로 편집 가능한 텍스트-벡터 스케치 생성을 가능하게 함을 보여준다.

English

We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.

에이전트에게 한 번에 한 부분씩 스케치하는 방법 가르치기

Teaching an Agent to Sketch One Part at a Time

초록

Support