IllumiCraft:统一几何与光照扩散的可控视频生成框架
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
June 3, 2025
作者: Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Ronald Clark, Ming-Hsuan Yang
cs.AI
摘要
尽管基于扩散的模型能够从文本或图像输入生成高质量、高分辨率的视频序列,但在跨帧控制场景光照和视觉外观时,它们缺乏对几何线索的显式整合。为解决这一局限,我们提出了IllumiCraft,一个端到端的扩散框架,接受三种互补输入:(1) 高动态范围(HDR)视频映射,用于精细的光照控制;(2) 合成重光照帧,带有随机化的光照变化(可选地搭配静态背景参考图像),以提供外观线索;(3) 3D点轨迹,捕捉精确的3D几何信息。通过在一个统一的扩散架构中整合光照、外观和几何线索,IllumiCraft生成与用户定义提示对齐的时间一致性视频。它支持背景条件和文本条件的视频重光照,并提供了比现有可控视频生成方法更高的保真度。项目页面:https://yuanze-lin.me/IllumiCraft_page
English
Although diffusion-based models can generate high-quality and high-resolution
video sequences from textual or image inputs, they lack explicit integration of
geometric cues when controlling scene lighting and visual appearance across
frames. To address this limitation, we propose IllumiCraft, an end-to-end
diffusion framework accepting three complementary inputs: (1)
high-dynamic-range (HDR) video maps for detailed lighting control; (2)
synthetically relit frames with randomized illumination changes (optionally
paired with a static background reference image) to provide appearance cues;
and (3) 3D point tracks that capture precise 3D geometry information. By
integrating the lighting, appearance, and geometry cues within a unified
diffusion architecture, IllumiCraft generates temporally coherent videos
aligned with user-defined prompts. It supports background-conditioned and
text-conditioned video relighting and provides better fidelity than existing
controllable video generation methods. Project Page:
https://yuanze-lin.me/IllumiCraft_page