UrbanIR: 単一映像からの大規模都市シーンの逆レンダリング

要旨

ビデオから新しい照明条件下でのシーンの現実的な自由視点レンダリングを可能にするモデルの構築方法を示します。本手法「UrbanIR: Urban Scene Inverse Rendering」は、ビデオから逆グラフィックス表現を計算します。UrbanIRは、未知の照明条件下での無制限な屋外シーンの単一ビデオから、形状、アルベド、可視性、太陽および天空照明を同時に推定します。UrbanIRは、車載カメラからのビデオを使用します（典型的なNeRFスタイルの推定における同じポイントの多数のビューとは対照的です）。その結果、標準的な手法では（例えば屋根などの）幾何学推定が不十分であり、多数の「フローティングオブジェクト」が発生します。逆グラフィックス推定の誤差は、強いレンダリングアーティファクトを引き起こす可能性があります。UrbanIRは、これらの誤差源やその他の誤差を制御するための新しい損失関数を使用します。UrbanIRは、元のシーンのシャドウボリュームを非常に正確に推定するための新しい損失関数を使用します。結果として得られる表現は、制御可能な編集を容易にし、再照明されたシーンや挿入されたオブジェクトのフォトリアルな自由視点レンダリングを実現します。定性的評価により、最先端技術に対する大幅な改善が示されています。

English

We show how to build a model that allows realistic, free-viewpoint renderings of a scene under novel lighting conditions from video. Our method -- UrbanIR: Urban Scene Inverse Rendering -- computes an inverse graphics representation from the video. UrbanIR jointly infers shape, albedo, visibility, and sun and sky illumination from a single video of unbounded outdoor scenes with unknown lighting. UrbanIR uses videos from cameras mounted on cars (in contrast to many views of the same points in typical NeRF-style estimation). As a result, standard methods produce poor geometry estimates (for example, roofs), and there are numerous ''floaters''. Errors in inverse graphics inference can result in strong rendering artifacts. UrbanIR uses novel losses to control these and other sources of error. UrbanIR uses a novel loss to make very good estimates of shadow volumes in the original scene. The resulting representations facilitate controllable editing, delivering photorealistic free-viewpoint renderings of relit scenes and inserted objects. Qualitative evaluation demonstrates strong improvements over the state-of-the-art.

UrbanIR: 単一映像からの大規模都市シーンの逆レンダリング

UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video

要旨

Support