多様な画像スタイル変換のためのスタイル専門家混合モデル

要旨

拡散モデルに基づくスタイル変換技術は大きく進展しているが、既存手法は色彩主導の変換に限定され、複雑な意味構造や材質の詳細を十分に考慮していない。本論文では、Mixture of Experts（MoE）に基づく意味認識フレームワーク「StyleExpert」を提案する。本フレームワークは、大規模に収集したコンテンツ・スタイル・スタイル化画像の三組データセットで学習した統一スタイルエンコーダにより、多様なスタイルを一貫した潜在空間に埋め込む。この埋め込み表現を利用して、MoEアーキテクチャ内で専門家モデルへの動的な経路制御を行う類似性感知ゲーティング機構を構築する。MoEアーキテクチャを活用することで、浅いテクスチャから深層意味に至る複数の意味レベルにわたる多様なスタイルを適切に処理できる。大規模な実験により、StyleExpertが未学習のスタイルへの汎化性能を維持しつつ、意味構造と材質の詳細保存において既存手法を凌駕することを実証した。コード及び収集画像はプロジェクトページ（https://hh-lg.github.io/StyleExpert-Page/）で公開している。

English

Diffusion-based stylization has advanced significantly, yet existing methods are limited to color-driven transformations, neglecting complex semantics and material details.We introduce StyleExpert, a semantic-aware framework based on the Mixture of Experts (MoE). Our framework employs a unified style encoder, trained on our large-scale dataset of content-style-stylized triplets, to embed diverse styles into a consistent latent space. This embedding is then used to condition a similarity-aware gating mechanism, which dynamically routes styles to specialized experts within the MoE architecture. Leveraging this MoE architecture, our method adeptly handles diverse styles spanning multiple semantic levels, from shallow textures to deep semantics. Extensive experiments show that StyleExpert outperforms existing approaches in preserving semantics and material details, while generalizing to unseen styles. Our code and collected images are available at the project page: https://hh-lg.github.io/StyleExpert-Page/.

多様な画像スタイル変換のためのスタイル専門家混合モデル

Mixture of Style Experts for Diverse Image Stylization

要旨

Support