IllumiCraft：統一幾何與光照擴散的可控影片生成

摘要

儘管基於擴散模型的技術能夠從文本或圖像輸入生成高質量、高分辨率的視頻序列，但在控制場景光照和跨幀視覺外觀時，這些模型缺乏對幾何線索的明確整合。為解決這一限制，我們提出了IllumiCraft，這是一個端到端的擴散框架，接受三種互補輸入：(1) 高動態範圍（HDR）視頻映射，用於精細的光照控制；(2) 合成重光照幀，帶有隨機化的光照變化（可選地配以靜態背景參考圖像），以提供外觀線索；以及(3) 捕捉精確三維幾何信息的三維點軌跡。通過在統一的擴散架構中整合光照、外觀和幾何線索，IllumiCraft生成與用戶定義提示對齊的時間一致性視頻。它支持基於背景和文本條件的視頻重光照，並提供了比現有可控視頻生成方法更高的保真度。項目頁面：https://yuanze-lin.me/IllumiCraft_page

English

Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across frames. To address this limitation, we propose IllumiCraft, an end-to-end diffusion framework accepting three complementary inputs: (1) high-dynamic-range (HDR) video maps for detailed lighting control; (2) synthetically relit frames with randomized illumination changes (optionally paired with a static background reference image) to provide appearance cues; and (3) 3D point tracks that capture precise 3D geometry information. By integrating the lighting, appearance, and geometry cues within a unified diffusion architecture, IllumiCraft generates temporally coherent videos aligned with user-defined prompts. It supports background-conditioned and text-conditioned video relighting and provides better fidelity than existing controllable video generation methods. Project Page: https://yuanze-lin.me/IllumiCraft_page

IllumiCraft：統一幾何與光照擴散的可控影片生成

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

摘要

Support