ChatPaper.aiChatPaper

BeyondScene:使用預訓練擴散技術進行更高解析度的以人為中心場景生成

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

April 6, 2024
作者: Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun
cs.AI

摘要

對於現有的文本到圖像擴散模型來說,生成具有更高分辨率、細節和控制的以人為中心場景仍然是一個挑戰。這個挑戰源於有限的訓練圖像大小、文本編碼器容量(有限的標記)以及生成涉及多個人的複雜場景的固有困難。儘管當前的方法試圖僅解決訓練大小限制,但通常會產生帶有嚴重人為中心場景瑕疵的結果。我們提出了一個名為BeyondScene的新框架,克服了先前的限制,使用現有的預訓練擴散模型生成精美的更高分辨率(超過8K)以人為中心場景,具有出色的文本-圖像對應和自然性。BeyondScene採用分階段和分層方法,首先生成一個詳細的基本圖像,聚焦於實例創建中的關鍵元素,用於多人和超出擴散模型標記限制的詳細描述,然後無縫地將基本圖像轉換為超出訓練圖像大小的輸出,並通過我們提出的高頻注入前向擴散和自適應聯合擴散的新型實例感知分層擴大過程,將細節納入文本和實例意識,BeyondScene在與詳細文本描述和自然性方面超越了現有方法,為超越預訓練擴散模型容量的更高分辨率以人為中心場景創建的先進應用鋪平了道路,而無需昂貴的重新訓練。項目頁面:https://janeyeon.github.io/beyond-scene。
English
Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often yielded human-centric scenes with severe artifacts. We propose BeyondScene, a novel framework that overcomes prior limitations, generating exquisite higher-resolution (over 8K) human-centric scenes with exceptional text-image correspondence and naturalness using existing pretrained diffusion models. BeyondScene employs a staged and hierarchical approach to initially generate a detailed base image focusing on crucial elements in instance creation for multiple humans and detailed descriptions beyond token limit of diffusion model, and then to seamlessly convert the base image to a higher-resolution output, exceeding training image size and incorporating details aware of text and instances via our novel instance-aware hierarchical enlargement process that consists of our proposed high-frequency injected forward diffusion and adaptive joint diffusion. BeyondScene surpasses existing methods in terms of correspondence with detailed text descriptions and naturalness, paving the way for advanced applications in higher-resolution human-centric scene creation beyond the capacity of pretrained diffusion models without costly retraining. Project page: https://janeyeon.github.io/beyond-scene.

Summary

AI-Generated Summary

PDF240December 15, 2024