IllumiCraft：统一几何与光照扩散的可控视频生成框架

摘要

尽管基于扩散的模型能够从文本或图像输入生成高质量、高分辨率的视频序列，但在跨帧控制场景光照和视觉外观时，它们缺乏对几何线索的显式整合。为解决这一局限，我们提出了IllumiCraft，一个端到端的扩散框架，接受三种互补输入：(1) 高动态范围（HDR）视频映射，用于精细的光照控制；(2) 合成重光照帧，带有随机化的光照变化（可选地搭配静态背景参考图像），以提供外观线索；(3) 3D点轨迹，捕捉精确的3D几何信息。通过在一个统一的扩散架构中整合光照、外观和几何线索，IllumiCraft生成与用户定义提示对齐的时间一致性视频。它支持背景条件和文本条件的视频重光照，并提供了比现有可控视频生成方法更高的保真度。项目页面：https://yuanze-lin.me/IllumiCraft_page

English

Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across frames. To address this limitation, we propose IllumiCraft, an end-to-end diffusion framework accepting three complementary inputs: (1) high-dynamic-range (HDR) video maps for detailed lighting control; (2) synthetically relit frames with randomized illumination changes (optionally paired with a static background reference image) to provide appearance cues; and (3) 3D point tracks that capture precise 3D geometry information. By integrating the lighting, appearance, and geometry cues within a unified diffusion architecture, IllumiCraft generates temporally coherent videos aligned with user-defined prompts. It supports background-conditioned and text-conditioned video relighting and provides better fidelity than existing controllable video generation methods. Project Page: https://yuanze-lin.me/IllumiCraft_page

IllumiCraft：统一几何与光照扩散的可控视频生成框架

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

摘要

Support