DiT360：通过混合训练实现高保真全景图像生成

摘要

在本研究中，我们提出了DiT360，一个基于DiT的框架，通过混合训练透视与全景数据来实现全景图像生成。针对生成质量中几何保真度与照片级真实感的问题，我们将其主要原因归结于缺乏大规模、高质量的真实世界全景数据，这一以数据为中心的观点与以往专注于模型设计的方法有所不同。DiT360核心包含多个关键模块，用于域间转换与域内增强，这些模块分别应用于VAE前的图像层面与VAE后的token层面。在图像层面，我们通过透视图像引导与全景细化引入跨域知识，以此提升感知质量，同时规范多样性与照片级真实感。在token层面，混合监督被应用于多个模块，包括用于边界连续性的循环填充、增强旋转鲁棒性的偏航损失以及提高畸变意识的立方体损失。在文本到全景、图像修复及扩展任务上的大量实验表明，我们的方法在十一项量化指标上均实现了更优的边界一致性与图像保真度。代码已公开于https://github.com/Insta360-Research-Team/DiT360。

English

In this work, we propose DiT360, a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation. For the issues of maintaining geometric fidelity and photorealism in generation quality, we attribute the main reason to the lack of large-scale, high-quality, real-world panoramic data, where such a data-centric view differs from prior methods that focus on model design. Basically, DiT360 has several key modules for inter-domain transformation and intra-domain augmentation, applied at both the pre-VAE image level and the post-VAE token level. At the image level, we incorporate cross-domain knowledge through perspective image guidance and panoramic refinement, which enhance perceptual quality while regularizing diversity and photorealism. At the token level, hybrid supervision is applied across multiple modules, which include circular padding for boundary continuity, yaw loss for rotational robustness, and cube loss for distortion awareness. Extensive experiments on text-to-panorama, inpainting, and outpainting tasks demonstrate that our method achieves better boundary consistency and image fidelity across eleven quantitative metrics. Our code is available at https://github.com/Insta360-Research-Team/DiT360.

DiT360：通过混合训练实现高保真全景图像生成

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

摘要

Support