IllumiCraft:統一幾何與光照擴散的可控影片生成
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
June 3, 2025
作者: Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Ronald Clark, Ming-Hsuan Yang
cs.AI
摘要
儘管基於擴散模型的技術能夠從文本或圖像輸入生成高質量、高分辨率的視頻序列,但在控制場景光照和跨幀視覺外觀時,這些模型缺乏對幾何線索的明確整合。為解決這一限制,我們提出了IllumiCraft,這是一個端到端的擴散框架,接受三種互補輸入:(1) 高動態範圍(HDR)視頻映射,用於精細的光照控制;(2) 合成重光照幀,帶有隨機化的光照變化(可選地配以靜態背景參考圖像),以提供外觀線索;以及(3) 捕捉精確三維幾何信息的三維點軌跡。通過在統一的擴散架構中整合光照、外觀和幾何線索,IllumiCraft生成與用戶定義提示對齊的時間一致性視頻。它支持基於背景和文本條件的視頻重光照,並提供了比現有可控視頻生成方法更高的保真度。項目頁面:https://yuanze-lin.me/IllumiCraft_page
English
Although diffusion-based models can generate high-quality and high-resolution
video sequences from textual or image inputs, they lack explicit integration of
geometric cues when controlling scene lighting and visual appearance across
frames. To address this limitation, we propose IllumiCraft, an end-to-end
diffusion framework accepting three complementary inputs: (1)
high-dynamic-range (HDR) video maps for detailed lighting control; (2)
synthetically relit frames with randomized illumination changes (optionally
paired with a static background reference image) to provide appearance cues;
and (3) 3D point tracks that capture precise 3D geometry information. By
integrating the lighting, appearance, and geometry cues within a unified
diffusion architecture, IllumiCraft generates temporally coherent videos
aligned with user-defined prompts. It supports background-conditioned and
text-conditioned video relighting and provides better fidelity than existing
controllable video generation methods. Project Page:
https://yuanze-lin.me/IllumiCraft_page