MagicQuill: 知的インタラクティブ画像編集システム

要旨

画像編集にはさまざまな複雑なタスクが関わり、効率的かつ正確な操作技術が必要です。本論文では、創造的なアイデアを迅速に具現化することができる統合画像編集システムであるMagicQuillを提案します。当システムは、簡素化されたが機能的に堅牢なインターフェースを特徴とし、最小限の入力で編集操作（要素の挿入、オブジェクトの消去、色の変更など）を明確に行うことができます。これらの相互作用は、編集意図をリアルタイムで予測し、明示的なプロンプト入力の必要性を回避するために、マルチモーダルな大規模言語モデル（MLLM）によって監視されます。最後に、編集リクエストを精密に制御するために、慎重に学習された2つのブランチのプラグインモジュールによって強化された強力な拡散事前確率を適用します。実験結果は、MagicQuillが高品質な画像編集を実現する効果を示しています。当システムを試すには、https://magic-quill.github.io をご覧ください。

English

Image editing involves a variety of complex tasks and requires efficient and precise manipulation techniques. In this paper, we present MagicQuill, an integrated image editing system that enables swift actualization of creative ideas. Our system features a streamlined yet functionally robust interface, allowing for the articulation of editing operations (e.g., inserting elements, erasing objects, altering color) with minimal input. These interactions are monitored by a multimodal large language model (MLLM) to anticipate editing intentions in real time, bypassing the need for explicit prompt entry. Finally, we apply a powerful diffusion prior, enhanced by a carefully learned two-branch plug-in module, to process editing requests with precise control. Experimental results demonstrate the effectiveness of MagicQuill in achieving high-quality image edits. Please visit https://magic-quill.github.io to try out our system.

MagicQuill: 知的インタラクティブ画像編集システム

MagicQuill: An Intelligent Interactive Image Editing System

要旨

Support