LooseControl: 一般化された深度条件付けのためのControlNetの拡張

要旨

本論文では、拡散モデルベースの画像生成における一般化された深度条件付けを可能にするLooseControlを提案します。深度条件付け画像生成のSOTAであるControlNetは、詳細な深度マップをガイダンスとして必要とし、優れた結果を生み出しますが、多くのシナリオでそのような正確な深度マップを作成することは困難です。本論文では、多くの新しいコンテンツ作成ワークフローを可能にする一般化された深度条件付けのバージョンを紹介します。具体的には、(C1) シーンの境界条件のみでシーンを大まかに指定するためのシーン境界制御と、(C2) 対象物の正確な形状や外観ではなく、レイアウト位置を指定するための3Dボックス制御を可能にします。LooseControlを使用することで、テキストガイダンスとともに、シーンの境界と主要なオブジェクトの位置を指定するだけで、複雑な環境（例：部屋、街並みなど）を作成できます。さらに、結果を洗練するための2つの編集メカニズムを提供します：(E1) 3Dボックス編集は、画像のスタイルを固定したまま、ボックスを変更、追加、または削除することで画像を洗練します。これにより、編集されたボックスによる変更以外の最小限の変更がもたらされます。(E2) 属性編集は、シーンの特定の側面（例：全体的なオブジェクト密度や特定のオブジェクト）を変更するための可能な編集方向を提案します。ベースラインとの広範なテストと比較により、本手法の汎用性が実証されています。LooseControlは、複雑な環境を簡単に作成するための重要なデザインツールとなり、他のガイダンスチャネルにも拡張できると信じています。コードと詳細情報はhttps://shariqfarooq123.github.io/loose-control/で入手可能です。

English

We present LooseControl to allow generalized depth conditioning for diffusion-based image generation. ControlNet, the SOTA for depth-conditioned image generation, produces remarkable results but relies on having access to detailed depth maps for guidance. Creating such exact depth maps, in many scenarios, is challenging. This paper introduces a generalized version of depth conditioning that enables many new content-creation workflows. Specifically, we allow (C1) scene boundary control for loosely specifying scenes with only boundary conditions, and (C2) 3D box control for specifying layout locations of the target objects rather than the exact shape and appearance of the objects. Using LooseControl, along with text guidance, users can create complex environments (e.g., rooms, street views, etc.) by specifying only scene boundaries and locations of primary objects. Further, we provide two editing mechanisms to refine the results: (E1) 3D box editing enables the user to refine images by changing, adding, or removing boxes while freezing the style of the image. This yields minimal changes apart from changes induced by the edited boxes. (E2) Attribute editing proposes possible editing directions to change one particular aspect of the scene, such as the overall object density or a particular object. Extensive tests and comparisons with baselines demonstrate the generality of our method. We believe that LooseControl can become an important design tool for easily creating complex environments and be extended to other forms of guidance channels. Code and more information are available at https://shariqfarooq123.github.io/loose-control/ .

LooseControl: 一般化された深度条件付けのためのControlNetの拡張

LooseControl: Lifting ControlNet for Generalized Depth Conditioning

要旨

Support