ChatPaper.aiChatPaper

CrossViewDiff:用於衛星到街景合成的跨視圖擴散模型

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

August 27, 2024
作者: Weijia Li, Jun He, Junyan Ye, Huaping Zhong, Zhimeng Zheng, Zilong Huang, Dahua Lin, Conghui He
cs.AI

摘要

衛星至街景合成旨在從對應的衛星視圖圖像生成逼真的街景圖像。儘管穩定擴散模型在各種圖像生成應用中表現出卓越性能,但它們依賴於類似視圖輸入以控制生成的結構或紋理,限制了它們應用於具有挑戰性的跨視圖合成任務。在這項工作中,我們提出CrossViewDiff,一種用於衛星至街景視圖合成的跨視圖擴散模型。為應對視圖之間的巨大差異帶來的挑戰,我們設計了衛星場景結構估計和跨視圖紋理映射模塊,以構建街景圖像合成的結構和紋理控制。我們進一步設計了一個跨視圖控制引導的去噪過程,通過增強的跨視圖注意模塊將上述控制因素納入其中。為了更全面地評估合成結果,我們另外設計了一種基於GPT的評分方法,作為標準評估指標的補充。我們還探討了不同數據來源(例如文本、地圖、建築高度和多時期衛星圖像)對此任務的影響。在三個公共跨視圖數據集上的結果顯示,CrossViewDiff在標準和基於GPT的評估指標上均優於當前最先進的方法,生成具有更真實結構和紋理的高質量街景全景,涵蓋鄉村、郊區和城市場景。此工作的代碼和模型將在https://opendatalab.github.io/CrossViewDiff/上發布。
English
Satellite-to-street view synthesis aims at generating a realistic street-view image from its corresponding satellite-view image. Although stable diffusion models have exhibit remarkable performance in a variety of image generation applications, their reliance on similar-view inputs to control the generated structure or texture restricts their application to the challenging cross-view synthesis task. In this work, we propose CrossViewDiff, a cross-view diffusion model for satellite-to-street view synthesis. To address the challenges posed by the large discrepancy across views, we design the satellite scene structure estimation and cross-view texture mapping modules to construct the structural and textural controls for street-view image synthesis. We further design a cross-view control guided denoising process that incorporates the above controls via an enhanced cross-view attention module. To achieve a more comprehensive evaluation of the synthesis results, we additionally design a GPT-based scoring method as a supplement to standard evaluation metrics. We also explore the effect of different data sources (e.g., text, maps, building heights, and multi-temporal satellite imagery) on this task. Results on three public cross-view datasets show that CrossViewDiff outperforms current state-of-the-art on both standard and GPT-based evaluation metrics, generating high-quality street-view panoramas with more realistic structures and textures across rural, suburban, and urban scenes. The code and models of this work will be released at https://opendatalab.github.io/CrossViewDiff/.

Summary

AI-Generated Summary

PDF152November 16, 2024