StarVector: 画像からのスケーラブルベクターグラフィックスコード生成

要旨

スケーラブル・ベクター・グラフィックス（SVG）は、解像度の無限のスケーラビリティ、多用途な使用性、および編集能力により、現代の画像レンダリングアプリケーションにおいて不可欠な存在となっています。SVGは特にウェブ開発やグラフィックデザインの分野で人気があります。既存の深層学習を用いたSVGモデリング手法は、複雑なSVGの生成に苦戦し、広範な処理と簡略化を必要とする単純なSVGに限定されることが多いです。本論文では、Code Generation Large Language Models（CodeLLMs）と視覚モデルを効果的に統合したマルチモーダルSVG生成モデルであるStarVectorを紹介します。我々のアプローチでは、CLIP画像エンコーダを使用してピクセルベースの画像から視覚表現を抽出し、アダプタモジュールを介して視覚トークンに変換します。これらの視覚トークンはSVGトークン埋め込みに前置され、StarCoderモデルによって次のトークン予測を使用してシーケンスがモデル化され、視覚トークンとコードトークンを効果的に整合させます。これにより、StarVectorはピクセル画像を正確に表現する無制限のSVGを生成することが可能となります。StarVectorの性能を評価するために、複数のデータセットと関連するメトリクスにわたってSVG手法を評価する包括的なベンチマークであるSVG-Benchを提示します。このベンチマーク内で、実世界のSVG例を大規模に集めたデータセットであるSVG-Stackを含む新しいデータセットを導入し、StarVectorをSVGの大規模基盤モデルとして事前学習に使用します。我々の結果は、現在の手法に比べて視覚品質と複雑性の処理において大幅な向上を示し、SVG生成技術における顕著な進歩を示しています。コードとモデルは以下で公開されています：https://github.com/joanrod/star-vector

English

Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simpler ones that require extensive processing and simplification. This paper introduces StarVector, a multimodal SVG generation model that effectively integrates Code Generation Large Language Models (CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to extract visual representations from pixel-based images, which are then transformed into visual tokens via an adapter module. These visual tokens are pre-pended to the SVG token embeddings, and the sequence is modeled by the StarCoder model using next-token prediction, effectively learning to align the visual and code tokens. This enables StarVector to generate unrestricted SVGs that accurately represent pixel images. To evaluate StarVector's performance, we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods across multiple datasets and relevant metrics. Within this benchmark, we introduce novel datasets including SVG-Stack, a large-scale dataset of real-world SVG examples, and use it to pre-train StarVector as a large foundation model for SVGs. Our results demonstrate significant enhancements in visual quality and complexity handling over current methods, marking a notable advancement in SVG generation technology. Code and models: https://github.com/joanrod/star-vector

StarVector: 画像からのスケーラブルベクターグラフィックスコード生成

StarVector: Generating Scalable Vector Graphics Code from Images

要旨

Support