ChatPaper.aiChatPaper

BeyondScene:使用预训练扩散生成更高分辨率的以人为中心的场景

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

April 6, 2024
作者: Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun
cs.AI

摘要

利用细节和控制生成更高分辨率的以人为中心的场景仍然是现有文本到图像扩散模型面临的挑战。这一挑战源自有限的训练图像尺寸、文本编码器容量(有限令牌)以及生成涉及多个人的复杂场景的固有困难。虽然当前方法尝试解决仅限于训练尺寸的限制,但通常会产生带有严重伪影的以人为中心的场景。我们提出了BeyondScene,这是一个新颖的框架,克服了先前的限制,利用现有预训练的扩散模型生成精美的更高分辨率(超过8K)的以人为中心的场景,具有出色的文本-图像对应性和自然性。BeyondScene采用分阶段和分层的方法,首先生成一个详细的基础图像,专注于实例创建中关键要素,用于多个人和扩散模型令牌限制之外的详细描述,然后将基础图像无缝转换为超出训练图像尺寸的输出,并通过我们提出的实例感知分层扩大过程,该过程包括我们提出的高频注入前向扩散和自适应联合扩散,融合了文本和实例感知的细节。BeyondScene在与详细文本描述和自然性的对应方面超越了现有方法,为超越预训练扩散模型容量的更高分辨率以人为中心的场景创建提供了可能,而无需昂贵的重新训练。项目页面:https://janeyeon.github.io/beyond-scene。
English
Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often yielded human-centric scenes with severe artifacts. We propose BeyondScene, a novel framework that overcomes prior limitations, generating exquisite higher-resolution (over 8K) human-centric scenes with exceptional text-image correspondence and naturalness using existing pretrained diffusion models. BeyondScene employs a staged and hierarchical approach to initially generate a detailed base image focusing on crucial elements in instance creation for multiple humans and detailed descriptions beyond token limit of diffusion model, and then to seamlessly convert the base image to a higher-resolution output, exceeding training image size and incorporating details aware of text and instances via our novel instance-aware hierarchical enlargement process that consists of our proposed high-frequency injected forward diffusion and adaptive joint diffusion. BeyondScene surpasses existing methods in terms of correspondence with detailed text descriptions and naturalness, paving the way for advanced applications in higher-resolution human-centric scene creation beyond the capacity of pretrained diffusion models without costly retraining. Project page: https://janeyeon.github.io/beyond-scene.

Summary

AI-Generated Summary

PDF240December 15, 2024