生成式世界渲染器

摘要

将生成式逆向渲染与正向渲染技术扩展至现实世界场景，其瓶颈在于现有合成数据集的真实感及时序连贯性有限。为弥合这一长期存在的领域差距，我们引入了从视觉复杂的3A游戏中采集的大规模动态数据集。通过新颖的双屏拼接采集方法，我们在多样化场景、视觉效果及环境（包括恶劣天气和动态模糊变体）中提取了400万帧连续画面（720p/30 FPS），包含同步的RGB图像和五个G-buffer通道。该数据集独特地推动了双向渲染的发展：既支持在真实场景下进行鲁棒的几何与材质分解，又为基于G-buffer引导的高保真视频生成提供支持。此外，为在无真实值条件下评估逆向渲染的实际性能，我们提出了基于视觉语言模型的全新评估协议，从语义、空间和时序三个维度衡量一致性。实验表明，基于本数据微调的逆向渲染器实现了卓越的跨数据集泛化能力和可控生成效果，而我们的VLM评估结果与人类判断高度吻合。结合我们的工具包，所提出的正向渲染器可使用户通过文本提示直接编辑3A游戏的G-buffer风格。

English

Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex AAA games. Using a novel dual-screen stitched capture method, we extracted 4M continuous frames (720p/30 FPS) of synchronized RGB and five G-buffer channels across diverse scenes, visual effects, and environments, including adverse weather and motion-blur variants. This dataset uniquely advances bidirectional rendering: enabling robust in-the-wild geometry and material decomposition, and facilitating high-fidelity G-buffer-guided video generation. Furthermore, to evaluate the real-world performance of inverse rendering without ground truth, we propose a novel VLM-based assessment protocol measuring semantic, spatial, and temporal consistency. Experiments demonstrate that inverse renderers fine-tuned on our data achieve superior cross-dataset generalization and controllable generation, while our VLM evaluation strongly correlates with human judgment. Combined with our toolkit, our forward renderer enables users to edit styles of AAA games from G-buffers using text prompts.

生成式世界渲染器

Generative World Renderer

摘要

Support