ChatPaper.aiChatPaper

CrossViewDiff:一种用于卫星到街景合成的跨视角扩散模型

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

August 27, 2024
作者: Weijia Li, Jun He, Junyan Ye, Huaping Zhong, Zhimeng Zheng, Zilong Huang, Dahua Lin, Conghui He
cs.AI

摘要

卫星到街景图像合成旨在从相应的卫星视图图像生成逼真的街景图像。尽管稳定的扩散模型在各种图像生成应用中表现出色,但它们对于控制生成的结构或纹理依赖于相似视图输入,限制了它们在具有挑战性的跨视图合成任务中的应用。在这项工作中,我们提出了CrossViewDiff,这是一种用于卫星到街景图像合成的跨视图扩散模型。为了解决跨视图之间存在的巨大差异带来的挑战,我们设计了卫星场景结构估计和跨视图纹理映射模块,以构建街景图像合成的结构和纹理控制。我们进一步设计了一种跨视图控制引导的去噪过程,通过增强的跨视图注意力模块将上述控制因素纳入其中。为了更全面地评估合成结果,我们另外设计了基于GPT的评分方法,作为标准评估指标的补充。我们还探讨了不同数据源(例如文本、地图、建筑高度和多时相卫星图像)对这一任务的影响。在三个公共跨视图数据集上的结果显示,CrossViewDiff在标准和基于GPT的评估指标上均优于当前最先进技术,生成了在农村、郊区和城市场景中具有更真实结构和纹理的高质量街景全景图。此工作的代码和模型将在https://opendatalab.github.io/CrossViewDiff/上发布。
English
Satellite-to-street view synthesis aims at generating a realistic street-view image from its corresponding satellite-view image. Although stable diffusion models have exhibit remarkable performance in a variety of image generation applications, their reliance on similar-view inputs to control the generated structure or texture restricts their application to the challenging cross-view synthesis task. In this work, we propose CrossViewDiff, a cross-view diffusion model for satellite-to-street view synthesis. To address the challenges posed by the large discrepancy across views, we design the satellite scene structure estimation and cross-view texture mapping modules to construct the structural and textural controls for street-view image synthesis. We further design a cross-view control guided denoising process that incorporates the above controls via an enhanced cross-view attention module. To achieve a more comprehensive evaluation of the synthesis results, we additionally design a GPT-based scoring method as a supplement to standard evaluation metrics. We also explore the effect of different data sources (e.g., text, maps, building heights, and multi-temporal satellite imagery) on this task. Results on three public cross-view datasets show that CrossViewDiff outperforms current state-of-the-art on both standard and GPT-based evaluation metrics, generating high-quality street-view panoramas with more realistic structures and textures across rural, suburban, and urban scenes. The code and models of this work will be released at https://opendatalab.github.io/CrossViewDiff/.

Summary

AI-Generated Summary

PDF152November 16, 2024