Light-X: カメラと照明制御による生成的4Dビデオレンダリング

要旨

照明制御の最近の進展は、画像ベースの手法を映像に拡張するものの、照明の忠実度と時間的一貫性の間のトレードオフに依然として直面しています。リライティングを超えて、現実世界のシーンに対する生成的モデリングに向けた重要なステップは、視覚的ダイナミクスが本質的に幾何学と照明の両方によって形成されるため、カメラ軌道と照明の共同制御です。この目的に向けて、我々は単眼映像から視点と照明の両方を制御可能なレンダリングを実現するビデオ生成フレームワーク「Light-X」を提案します。1) 幾何学と照明信号を分離するデザインを提案します：幾何学と動きは、ユーザー定義のカメラ軌道に沿って投影される動的点群によって捕捉され、照明の手がかりは、同じ幾何学に一貫して投影されるリライトされたフレームによって提供されます。これらの明示的で細粒度の手がかりは、効果的な分離を可能にし、高品質な照明を導きます。2) ペアとなった多視点・多照明ビデオの不足に対処するため、逆マッピングによる劣化ベースのパイプライン「Light-Syn」を導入し、野生の単眼映像から訓練ペアを合成します。この戦略により、静的、動的、AI生成シーンをカバーするデータセットが得られ、ロバストな訓練を保証します。大規模な実験により、Light-Xが共同カメラ・照明制御においてベースライン手法を上回り、テキスト条件設定と背景条件設定の両方において、従来のビデオリライティング手法を凌駕することを示します。

English

Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key step toward generative modeling of real-world scenes is the joint control of camera trajectory and illumination, since visual dynamics are inherently shaped by both geometry and lighting. To this end, we present Light-X, a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control. 1) We propose a disentangled design that decouples geometry and lighting signals: geometry and motion are captured via dynamic point clouds projected along user-defined camera trajectories, while illumination cues are provided by a relit frame consistently projected into the same geometry. These explicit, fine-grained cues enable effective disentanglement and guide high-quality illumination. 2) To address the lack of paired multi-view and multi-illumination videos, we introduce Light-Syn, a degradation-based pipeline with inverse-mapping that synthesizes training pairs from in-the-wild monocular footage. This strategy yields a dataset covering static, dynamic, and AI-generated scenes, ensuring robust training. Extensive experiments show that Light-X outperforms baseline methods in joint camera-illumination control and surpasses prior video relighting methods under both text- and background-conditioned settings.

Light-X: カメラと照明制御による生成的4Dビデオレンダリング

Light-X: Generative 4D Video Rendering with Camera and Illumination Control

要旨

Support