ChatPaper.aiChatPaper

街景:使用自回归视频扩散实现大规模一致的街景生成

Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

July 18, 2024
作者: Boyang Deng, Richard Tucker, Zhengqi Li, Leonidas Guibas, Noah Snavely, Gordon Wetzstein
cs.AI

摘要

我们提出了一种生成街景的方法,通过即时合成的城市规模场景生成长序列的视图。我们的生成受语言输入(例如城市名称、天气)以及包含所需轨迹的基础地图/布局的条件约束。与最近用于视频生成或3D视图合成的模型相比,我们的方法可以扩展到跨越多个城市街区的更长范围摄像机轨迹,同时保持视觉质量和一致性。为实现这一目标,我们借鉴了最近关于视频扩散的研究成果,该成果应用于能够轻松扩展到长序列的自回归框架。具体而言,我们引入了一种新的时间插补方法,防止我们的自回归方法偏离真实城市图像的分布。我们在一个引人注目的数据来源上训练我们的街景系统,该数据来源是来自Google街景视图的图像,以及上下文地图数据,这使用户可以生成基于任何所需城市布局的城市视图,并具有可控摄像机姿势。请在我们的项目页面https://boyangdeng.com/streetscapes 上查看更多结果。
English
We present a method for generating Streetscapes-long sequences of views through an on-the-fly synthesized city-scale scene. Our generation is conditioned by language input (e.g., city name, weather), as well as an underlying map/layout hosting the desired trajectory. Compared to recent models for video generation or 3D view synthesis, our method can scale to much longer-range camera trajectories, spanning several city blocks, while maintaining visual quality and consistency. To achieve this goal, we build on recent work on video diffusion, used within an autoregressive framework that can easily scale to long sequences. In particular, we introduce a new temporal imputation method that prevents our autoregressive approach from drifting from the distribution of realistic city imagery. We train our Streetscapes system on a compelling source of data-posed imagery from Google Street View, along with contextual map data-which allows users to generate city views conditioned on any desired city layout, with controllable camera poses. Please see more results at our project page at https://boyangdeng.com/streetscapes.

Summary

AI-Generated Summary

PDF182November 28, 2024