MultiBooth:朝向從性生成圖像中所有概念的方向發展
MultiBooth: Towards Generating All Your Concepts in an Image from Text
April 22, 2024
作者: Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Li Xiu
cs.AI
摘要
本文介紹了一種名為MultiBooth的新穎且高效的技術,用於從文本生成圖像中的多概念定制。儘管在定制生成方法方面取得了顯著進展,特別是擴散模型的成功,現有方法在處理多概念場景時常常因概念保真度低和推理成本高而遇到困難。MultiBooth通過將多概念生成過程分為兩個階段來解決這些問題:單概念學習階段和多概念集成階段。在單概念學習階段,我們利用多模態圖像編碼器和高效概念編碼技術來學習每個概念的簡潔且具有區分性的表示。在多概念集成階段,我們使用邊界框來定義交叉注意力地圖中每個概念的生成區域。這種方法使得能夠在指定區域內創建單獨的概念,從而促進多概念圖像的形成。這一策略不僅提高了概念的保真度,還降低了額外的推理成本。MultiBooth在定性和定量評估中均超越了各種基準,展示了其卓越的性能和計算效率。項目頁面:https://multibooth.github.io/
English
This paper introduces MultiBooth, a novel and efficient technique for
multi-concept customization in image generation from text. Despite the
significant advancements in customized generation methods, particularly with
the success of diffusion models, existing methods often struggle with
multi-concept scenarios due to low concept fidelity and high inference cost.
MultiBooth addresses these issues by dividing the multi-concept generation
process into two phases: a single-concept learning phase and a multi-concept
integration phase. During the single-concept learning phase, we employ a
multi-modal image encoder and an efficient concept encoding technique to learn
a concise and discriminative representation for each concept. In the
multi-concept integration phase, we use bounding boxes to define the generation
area for each concept within the cross-attention map. This method enables the
creation of individual concepts within their specified regions, thereby
facilitating the formation of multi-concept images. This strategy not only
improves concept fidelity but also reduces additional inference cost.
MultiBooth surpasses various baselines in both qualitative and quantitative
evaluations, showcasing its superior performance and computational efficiency.
Project Page: https://multibooth.github.io/Summary
AI-Generated Summary