画像拡散モデルを活用したテキストからベクトル生成のスタイルカスタマイズ

要旨

スケーラブル・ベクター・グラフィックス（SVG）は、解像度に依存しない特性と整然としたレイヤー構造により、デザイナーから高い評価を受けています。既存のテキストからベクター（T2V）生成手法は、テキストプロンプトからSVGを作成できますが、実用的なアプリケーションにおける重要なニーズ、すなわち一貫した視覚的表現と調和の取れた美学を実現するためのスタイルカスタマイズを見落としがちです。既存のT2V手法をスタイルカスタマイズに拡張するには、いくつかの課題があります。最適化ベースのT2Vモデルは、テキストから画像（T2I）モデルの事前知識を活用してカスタマイズできますが、構造的な規則性を維持するのが困難です。一方、フィードフォワード型のT2Vモデルは構造的な規則性を保証できますが、限られたSVG学習データのため、コンテンツとスタイルを分離するのに苦労します。これらの課題に対処するため、我々はフィードフォワード型T2VモデルとT2I画像の事前知識の両方の利点を活用した、新しい2段階のスタイルカスタマイズパイプラインを提案します。第1段階では、パスレベル表現を用いてT2V拡散モデルを訓練し、SVGの構造的な規則性を保ちつつ多様な表現力を維持します。第2段階では、カスタマイズされたT2Iモデルを蒸留することで、T2V拡散モデルを異なるスタイルに適応させます。これらの技術を統合することで、我々のパイプラインはテキストプロンプトに基づいて効率的なフィードフォワード方式で、カスタムスタイルの高品質で多様なSVGを生成できます。本手法の有効性は、広範な実験を通じて検証されています。プロジェクトページはhttps://customsvg.github.ioです。

English

Scalable Vector Graphics (SVGs) are highly favored by designers due to their resolution independence and well-organized layer structure. Although existing text-to-vector (T2V) generation methods can create SVGs from text prompts, they often overlook an important need in practical applications: style customization, which is vital for producing a collection of vector graphics with consistent visual appearance and coherent aesthetics. Extending existing T2V methods for style customization poses certain challenges. Optimization-based T2V models can utilize the priors of text-to-image (T2I) models for customization, but struggle with maintaining structural regularity. On the other hand, feed-forward T2V models can ensure structural regularity, yet they encounter difficulties in disentangling content and style due to limited SVG training data. To address these challenges, we propose a novel two-stage style customization pipeline for SVG generation, making use of the advantages of both feed-forward T2V models and T2I image priors. In the first stage, we train a T2V diffusion model with a path-level representation to ensure the structural regularity of SVGs while preserving diverse expressive capabilities. In the second stage, we customize the T2V diffusion model to different styles by distilling customized T2I models. By integrating these techniques, our pipeline can generate high-quality and diverse SVGs in custom styles based on text prompts in an efficient feed-forward manner. The effectiveness of our method has been validated through extensive experiments. The project page is https://customsvg.github.io.

画像拡散モデルを活用したテキストからベクトル生成のスタイルカスタマイズ

Style Customization of Text-to-Vector Generation with Image Diffusion Priors

要旨

Support