ニューラルシーン年代記

要旨

本研究では、大規模なランドマークのインターネット写真から、視点、照明、時間を独立して制御可能なフォトリアルなレンダリングを生成できる時変3Dモデルの再構築を目指します。核心的な課題は二つあります。第一に、照明やシーン自体の変化（例えばグラフィティアートの入れ替えなど）といった異なる種類の時間的変化が、画像内で複雑に絡み合っている点です。第二に、シーンレベルの時間的変化は連続的ではなく、離散的で散発的であることが多い点です。これらの問題に対処するため、我々は新しいシーン表現を提案します。これは、離散的なシーンレベルの内容変化を時間に対して区分的に一定な関数としてモデル化できる、新たな時間的ステップ関数エンコーディング手法を備えています。具体的には、シーンを時空間放射輝度フィールドとして表現し、各画像ごとの照明埋め込みを用います。ここで、時間的に変化するシーンの変化は、学習されたステップ関数のセットを用いてエンコードされます。インターネット画像からの年代順再構築タスクを容易にするため、我々は時間とともに様々な変化を示す4つのシーンからなる新しいデータセットも収集しました。本手法が、このデータセットにおいて最先端の視点合成結果を示しつつ、視点、時間、照明の独立した制御を実現することを実証します。

English

In this work, we aim to reconstruct a time-varying 3D model, capable of rendering photo-realistic renderings with independent control of viewpoint, illumination, and time, from Internet photos of large-scale landmarks. The core challenges are twofold. First, different types of temporal changes, such as illumination and changes to the underlying scene itself (such as replacing one graffiti artwork with another) are entangled together in the imagery. Second, scene-level temporal changes are often discrete and sporadic over time, rather than continuous. To tackle these problems, we propose a new scene representation equipped with a novel temporal step function encoding method that can model discrete scene-level content changes as piece-wise constant functions over time. Specifically, we represent the scene as a space-time radiance field with a per-image illumination embedding, where temporally-varying scene changes are encoded using a set of learned step functions. To facilitate our task of chronology reconstruction from Internet imagery, we also collect a new dataset of four scenes that exhibit various changes over time. We demonstrate that our method exhibits state-of-the-art view synthesis results on this dataset, while achieving independent control of viewpoint, time, and illumination.

ニューラルシーン年代記

Neural Scene Chronology

要旨

Support