StarVector：從圖像生成可擴展向量圖代碼

摘要

可伸縮向量圖形（SVG）已成為現代圖像渲染應用中不可或缺的一部分，因為它們具有無限的解析度可伸縮性、多功能性和編輯能力。SVG在網頁開發和平面設計領域特別受歡迎。現有的使用深度學習進行SVG建模的方法通常難以生成複雜的SVG，僅限於需要大量處理和簡化的簡單SVG。本文介紹了StarVector，一種多模態SVG生成模型，有效地將代碼生成大型語言模型（CodeLLMs）和視覺模型整合在一起。我們的方法利用CLIP圖像編碼器從基於像素的圖像中提取視覺表示，然後通過適配器模塊將其轉換為視覺標記。這些視覺標記被預置到SVG標記嵌入中，並且該序列由StarCoder模型進行建模，使用下一個標記預測，有效地學習對齊視覺和代碼標記。這使StarVector能夠生成準確代表像素圖像的不受限制的SVG。為了評估StarVector的性能，我們提出了SVG-Bench，一個全面評估SVG方法的基準，涵蓋多個數據集和相關指標。在這個基準中，我們引入了新的數據集，包括SVG-Stack，一個大規模的現實世界SVG示例數據集，並將其用於預先訓練StarVector作為SVG的大型基礎模型。我們的結果顯示，相對於當前方法，StarVector在視覺質量和複雜性處理方面取得了顯著的提升，標誌著SVG生成技術的一個顯著進步。代碼和模型：https://github.com/joanrod/star-vector

English

Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simpler ones that require extensive processing and simplification. This paper introduces StarVector, a multimodal SVG generation model that effectively integrates Code Generation Large Language Models (CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to extract visual representations from pixel-based images, which are then transformed into visual tokens via an adapter module. These visual tokens are pre-pended to the SVG token embeddings, and the sequence is modeled by the StarCoder model using next-token prediction, effectively learning to align the visual and code tokens. This enables StarVector to generate unrestricted SVGs that accurately represent pixel images. To evaluate StarVector's performance, we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods across multiple datasets and relevant metrics. Within this benchmark, we introduce novel datasets including SVG-Stack, a large-scale dataset of real-world SVG examples, and use it to pre-train StarVector as a large foundation model for SVGs. Our results demonstrate significant enhancements in visual quality and complexity handling over current methods, marking a notable advancement in SVG generation technology. Code and models: https://github.com/joanrod/star-vector

StarVector：從圖像生成可擴展向量圖代碼

StarVector: Generating Scalable Vector Graphics Code from Images

摘要

Support