DiT360:通过混合训练实现高保真全景图像生成
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
October 13, 2025
作者: Haoran Feng, Dizhe Zhang, Xiangtai Li, Bo Du, Lu Qi
cs.AI
摘要
在本研究中,我们提出了DiT360,一个基于DiT的框架,通过混合训练透视与全景数据来实现全景图像生成。针对生成质量中几何保真度与照片级真实感的问题,我们将其主要原因归结于缺乏大规模、高质量的真实世界全景数据,这一以数据为中心的观点与以往专注于模型设计的方法有所不同。DiT360核心包含多个关键模块,用于域间转换与域内增强,这些模块分别应用于VAE前的图像层面与VAE后的token层面。在图像层面,我们通过透视图像引导与全景细化引入跨域知识,以此提升感知质量,同时规范多样性与照片级真实感。在token层面,混合监督被应用于多个模块,包括用于边界连续性的循环填充、增强旋转鲁棒性的偏航损失以及提高畸变意识的立方体损失。在文本到全景、图像修复及扩展任务上的大量实验表明,我们的方法在十一项量化指标上均实现了更优的边界一致性与图像保真度。代码已公开于https://github.com/Insta360-Research-Team/DiT360。
English
In this work, we propose DiT360, a DiT-based framework that performs hybrid
training on perspective and panoramic data for panoramic image generation. For
the issues of maintaining geometric fidelity and photorealism in generation
quality, we attribute the main reason to the lack of large-scale, high-quality,
real-world panoramic data, where such a data-centric view differs from prior
methods that focus on model design. Basically, DiT360 has several key modules
for inter-domain transformation and intra-domain augmentation, applied at both
the pre-VAE image level and the post-VAE token level. At the image level, we
incorporate cross-domain knowledge through perspective image guidance and
panoramic refinement, which enhance perceptual quality while regularizing
diversity and photorealism. At the token level, hybrid supervision is applied
across multiple modules, which include circular padding for boundary
continuity, yaw loss for rotational robustness, and cube loss for distortion
awareness. Extensive experiments on text-to-panorama, inpainting, and
outpainting tasks demonstrate that our method achieves better boundary
consistency and image fidelity across eleven quantitative metrics. Our code is
available at https://github.com/Insta360-Research-Team/DiT360.