生成式世界渲染器

摘要

將生成式逆向渲染與正向渲染技術擴展至真實世界應用的瓶頸，在於現有合成數據集的真實性與時序連貫性不足。為彌補這一長期存在的領域差距，我們引入從視覺複雜的3A遊戲中採集的大規模動態數據集。通過創新的雙屏拼接採集技術，我們在多樣化場景、視覺效果及環境（包括惡劣天氣與動態模糊變體）中提取了400萬幀連續畫面（720p/30 FPS），包含同步的RGB圖像與五個G-buffer通道。該數據集獨特地推動了雙向渲染發展：既能實現強健的野外幾何與材質分解，又可支持高保真度的G-buffer引導影片生成。此外，為在缺乏真實值的情況下評估逆向渲染的實際性能，我們提出基於視覺語言模型的新型評估協議，量化語義、空間與時序一致性。實驗表明，基於本數據集微調的逆向渲染器能實現卓越的跨數據集泛化能力與可控生成效果，而我們的VLM評估結果與人類判斷高度相關。結合我們的工具鏈，該正向渲染器可使用戶通過文字提示直接編輯3A遊戲的G-buffer風格。

English

Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex AAA games. Using a novel dual-screen stitched capture method, we extracted 4M continuous frames (720p/30 FPS) of synchronized RGB and five G-buffer channels across diverse scenes, visual effects, and environments, including adverse weather and motion-blur variants. This dataset uniquely advances bidirectional rendering: enabling robust in-the-wild geometry and material decomposition, and facilitating high-fidelity G-buffer-guided video generation. Furthermore, to evaluate the real-world performance of inverse rendering without ground truth, we propose a novel VLM-based assessment protocol measuring semantic, spatial, and temporal consistency. Experiments demonstrate that inverse renderers fine-tuned on our data achieve superior cross-dataset generalization and controllable generation, while our VLM evaluation strongly correlates with human judgment. Combined with our toolkit, our forward renderer enables users to edit styles of AAA games from G-buffers using text prompts.

生成式世界渲染器

Generative World Renderer

摘要

Support