学习连续的三维词以用于文本到图像生成

摘要

目前对于图像生成的扩散模型（例如，通过文本或ControlNet）的控制在识别抽象的连续属性（如照明方向或非刚性形状变化）方面存在不足。在本文中，我们提出了一种方法，允许文本到图像模型的用户对图像中的多个属性进行精细控制。我们通过设计特殊的输入标记集，可以连续地转换这些属性，我们称之为连续3D单词。这些属性可以例如表示为滑块，并与文本提示一起应用，以实现对图像生成的精细控制。我们展示了，只需一个网格和一个渲染引擎，我们的方法可以被采用，以提供对多个3D感知属性的连续用户控制，包括一天中的照明时间、鸟翼方向、镜头变焦效果和物体姿势。我们的方法能够同时使用多个连续3D单词和文本描述对图像创建进行条件化，而不会增加生成过程的额外开销。项目页面：https://ttchengab.github.io/continuous_3d_words

English

Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input tokens that can be transformed in a continuous manner -- we call them Continuous 3D Words. These attributes can, for example, be represented as sliders and applied jointly with text prompts for fine-grained control over image generation. Given only a single mesh and a rendering engine, we show that our approach can be adopted to provide continuous user control over several 3D-aware attributes, including time-of-day illumination, bird wing orientation, dollyzoom effect, and object poses. Our method is capable of conditioning image creation with multiple Continuous 3D Words and text descriptions simultaneously while adding no overhead to the generative process. Project Page: https://ttchengab.github.io/continuous_3d_words

学习连续的三维词以用于文本到图像生成

Learning Continuous 3D Words for Text-to-Image Generation

摘要

Support