游戏引擎合成数据集中缩小仿真与真实外观差异的混合方法

摘要

电子游戏引擎已成为生成海量视觉合成数据集的重要来源，这些数据集用于训练和评估即将部署于现实世界的计算机视觉算法。尽管现代游戏引擎通过光线追踪等技术显著提升了视觉逼真度，但合成图像与现实图像之间仍存在显著的模拟到真实（sim2real）外观差异，这限制了合成数据集在现实应用中的利用率。本文研究了一种前沿图像生成与编辑扩散模型（FLUX.2-4B Klein）在提升合成数据集照片真实感方面的能力，并将其性能与传统图像到图像转换模型（REGEN）进行对比。此外，我们提出一种混合方法，将基于扩散方法的强大几何与材质变换能力，与图像到图像转换技术的分布匹配特性相结合。实验表明，REGEN模型性能优于FLUX.2-4B Klein模型，且通过结合两种模型，在保持语义一致性的同时，能获得比单独使用任一模型更优的视觉真实感。代码已开源：https://github.com/stefanos50/Hybrid-Sim2Real

English

Video game engines have been an important source for generating large volumes of visual synthetic datasets for training and evaluating computer vision algorithms that are to be deployed in the real world. While the visual fidelity of modern game engines has been significantly improved with technologies such as ray-tracing, a notable sim2real appearance gap between the synthetic and the real-world images still remains, which limits the utilization of synthetic datasets in real-world applications. In this letter, we investigate the ability of a state-of-the-art image generation and editing diffusion model (FLUX.2-4B Klein) to enhance the photorealism of synthetic datasets and compare its performance against a traditional image-to-image translation model (REGEN). Furthermore, we propose a hybrid approach that combines the strong geometry and material transformations of diffusion-based methods with the distribution-matching capabilities of image-to-image translation techniques. Through experiments, it is demonstrated that REGEN outperforms FLUX.2-4B Klein and that by combining both FLUX.2-4B Klein and REGEN models, better visual realism can be achieved compared to using each model individually, while maintaining semantic consistency. The code is available at: https://github.com/stefanos50/Hybrid-Sim2Real

游戏引擎合成数据集中缩小仿真与真实外观差异的混合方法

A Hybrid Approach for Closing the Sim2real Appearance Gap in Game Engine Synthetic Datasets

摘要

Support