ChatPaper.aiChatPaper

CineScale:高分辨率电影视觉生成中的免费午餐

CineScale: Free Lunch in High-Resolution Cinematic Visual Generation

August 21, 2025
作者: Haonan Qiu, Ning Yu, Ziqi Huang, Paul Debevec, Ziwei Liu
cs.AI

摘要

视觉扩散模型取得了显著进展,但由于缺乏高分辨率数据及计算资源受限,通常只能在有限分辨率下训练,这限制了其生成高保真图像或高分辨率视频的能力。近期研究探索了无需调优的策略,以挖掘预训练模型在高分辨率视觉生成方面的潜力。然而,这些方法仍易产生带有重复图案的低质量视觉内容。关键障碍在于,当模型生成超出其训练分辨率的视觉内容时,高频信息的不可避免增加会导致误差累积,进而产生不理想的重复图案。本研究中,我们提出了CineScale,一种新颖的推理范式,旨在实现更高分辨率的视觉生成。针对两种视频生成架构引入的不同问题,我们分别设计了专用变体。与现有基线方法局限于高分辨率文本到图像(T2I)和文本到视频(T2V)生成不同,CineScale扩展了应用范围,支持基于最先进开源视频生成框架的高分辨率图像到视频(I2V)和视频到视频(V2V)合成。大量实验验证了我们的范式在扩展图像和视频模型高分辨率生成能力方面的优越性。尤为突出的是,我们的方法无需任何微调即可实现8K图像生成,仅需少量LoRA微调即可达成4K视频生成。生成的视频样本可在我们的网站上查看:https://eyeline-labs.github.io/CineScale/。
English
Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability to generate high-fidelity images or videos at higher resolutions. Recent efforts have explored tuning-free strategies to exhibit the untapped potential higher-resolution visual generation of pre-trained models. However, these methods are still prone to producing low-quality visual content with repetitive patterns. The key obstacle lies in the inevitable increase in high-frequency information when the model generates visual content exceeding its training resolution, leading to undesirable repetitive patterns deriving from the accumulated errors. In this work, we propose CineScale, a novel inference paradigm to enable higher-resolution visual generation. To tackle the various issues introduced by the two types of video generation architectures, we propose dedicated variants tailored to each. Unlike existing baseline methods that are confined to high-resolution T2I and T2V generation, CineScale broadens the scope by enabling high-resolution I2V and V2V synthesis, built atop state-of-the-art open-source video generation frameworks. Extensive experiments validate the superiority of our paradigm in extending the capabilities of higher-resolution visual generation for both image and video models. Remarkably, our approach enables 8k image generation without any fine-tuning, and achieves 4k video generation with only minimal LoRA fine-tuning. Generated video samples are available at our website: https://eyeline-labs.github.io/CineScale/.
PDF61August 27, 2025