全球定位系統（GPS）作為影像生成的控制信號

摘要

我們展示了照片元數據中包含的GPS標籤為圖像生成提供了一個有用的控制信號。我們訓練了GPS到圖像的模型並將其應用於需要對城市內圖像變化有細緻理解的任務。特別是，我們訓練了一個擴散模型，以GPS和文本為條件生成圖像。學習的模型生成了捕捉不同社區、公園和地標獨特外觀的圖像。我們還通過得分蒸餾抽樣從2D GPS到圖像模型中提取3D模型，使用GPS條件來限制從每個視角重建的外觀。我們的評估表明，我們的GPS條件模型成功地學習生成根據位置變化的圖像，並且GPS條件改善了對3D結構的估計。

English

We show that the GPS tags contained in photo metadata provide a useful control signal for image generation. We train GPS-to-image models and use them for tasks that require a fine-grained understanding of how images vary within a city. In particular, we train a diffusion model to generate images conditioned on both GPS and text. The learned model generates images that capture the distinctive appearance of different neighborhoods, parks, and landmarks. We also extract 3D models from 2D GPS-to-image models through score distillation sampling, using GPS conditioning to constrain the appearance of the reconstruction from each viewpoint. Our evaluations suggest that our GPS-conditioned models successfully learn to generate images that vary based on location, and that GPS conditioning improves estimated 3D structure.