LoRAShop: 修正フロートランスフォーマーを用いたトレーニング不要なマルチコンセプト画像生成と編集

要旨

私たちは、LoRAモデルを用いたマルチコンセプト画像編集のための初のフレームワークであるLoRAShopを紹介します。LoRAShopは、Fluxスタイルの拡散トランスフォーマー内部の特徴量相互作用パターンに関する重要な観察に基づいています。具体的には、コンセプト固有のトランスフォーマー特徴量が、ノイズ除去プロセスの初期段階で空間的に一貫した領域を活性化するという点です。この観察を活用し、事前のフォワードパスで各コンセプトのための分離された潜在マスクを導出し、パーソナライズするコンセプトを囲む領域内でのみ対応するLoRAの重みをブレンドします。その結果、複数の被写体やスタイルを元のシーンにシームレスに統合しつつ、グローバルなコンテキスト、照明、細部を保持した編集が可能となります。実験により、LoRAShopがベースラインと比較して優れたアイデンティティ保持を実現することが示されています。再学習や外部制約を排除することで、LoRAShopはパーソナライズされた拡散モデルを実用的な「LoRAを使ったフォトショップ」ツールに変え、構成的なビジュアルストーリーテリングや迅速なクリエイティブイテレーションの新たな道を開きます。

English

We introduce LoRAShop, the first framework for multi-concept image editing with LoRA models. LoRAShop builds on a key observation about the feature interaction patterns inside Flux-style diffusion transformers: concept-specific transformer features activate spatially coherent regions early in the denoising process. We harness this observation to derive a disentangled latent mask for each concept in a prior forward pass and blend the corresponding LoRA weights only within regions bounding the concepts to be personalized. The resulting edits seamlessly integrate multiple subjects or styles into the original scene while preserving global context, lighting, and fine details. Our experiments demonstrate that LoRAShop delivers better identity preservation compared to baselines. By eliminating retraining and external constraints, LoRAShop turns personalized diffusion models into a practical `photoshop-with-LoRAs' tool and opens new avenues for compositional visual storytelling and rapid creative iteration.

LoRAShop: 修正フロートランスフォーマーを用いたトレーニング不要なマルチコンセプト画像生成と編集

LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers

要旨

Support