DiT360:通過混合訓練實現高保真全景圖像生成
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
October 13, 2025
作者: Haoran Feng, Dizhe Zhang, Xiangtai Li, Bo Du, Lu Qi
cs.AI
摘要
在本研究中,我們提出了DiT360,這是一個基於DiT的框架,旨在通過對透視圖和全景圖數據進行混合訓練來生成全景圖像。針對生成質量中幾何保真度和照片真實感的問題,我們將主要原因歸結於缺乏大規模、高質量的真實世界全景數據,這種以數據為中心的視角與以往專注於模型設計的方法有所不同。DiT360主要包含幾個關鍵模塊,用於域間轉換和域內增強,這些模塊應用於VAE前的圖像層面和VAE後的token層面。在圖像層面,我們通過透視圖引導和全景圖精煉來融入跨域知識,這不僅提升了感知質量,還規範了多樣性和照片真實感。在token層面,混合監督被應用於多個模塊,包括用於邊界連續性的循環填充、用於旋轉魯棒性的偏航損失,以及用於畸變感知的立方體損失。在文本到全景圖、圖像修復和圖像擴展任務上的大量實驗表明,我們的方法在十一項定量指標上實現了更好的邊界一致性和圖像保真度。我們的代碼可在https://github.com/Insta360-Research-Team/DiT360獲取。
English
In this work, we propose DiT360, a DiT-based framework that performs hybrid
training on perspective and panoramic data for panoramic image generation. For
the issues of maintaining geometric fidelity and photorealism in generation
quality, we attribute the main reason to the lack of large-scale, high-quality,
real-world panoramic data, where such a data-centric view differs from prior
methods that focus on model design. Basically, DiT360 has several key modules
for inter-domain transformation and intra-domain augmentation, applied at both
the pre-VAE image level and the post-VAE token level. At the image level, we
incorporate cross-domain knowledge through perspective image guidance and
panoramic refinement, which enhance perceptual quality while regularizing
diversity and photorealism. At the token level, hybrid supervision is applied
across multiple modules, which include circular padding for boundary
continuity, yaw loss for rotational robustness, and cube loss for distortion
awareness. Extensive experiments on text-to-panorama, inpainting, and
outpainting tasks demonstrate that our method achieves better boundary
consistency and image fidelity across eleven quantitative metrics. Our code is
available at https://github.com/Insta360-Research-Team/DiT360.