整列された新規視点画像とジオメトリ合成：クロスモーダル注意注入を介して

要旨

本論文では、ワーピング・アンド・インペインティング手法を用いて、整列された新規視点画像とジオメトリ生成を実行する拡散ベースのフレームワークを提案する。従来の手法では、密なポーズ画像やドメイン内視点に限定されたポーズ埋め込み生成モデルが必要であったが、本手法では、オフザシェルフのジオメトリ予測器を活用して参照画像から見た部分的なジオメトリを予測し、新規視点合成を画像とジオメトリの両方に対するインペインティングタスクとして定式化する。生成された画像とジオメトリの正確な整列を確保するために、画像拡散ブランチからのアテンションマップを並列のジオメトリ拡散ブランチに注入するクロスモーダルアテンション蒸留を提案する。このマルチタスクアプローチは、幾何学的にロバストな画像合成と明確なジオメトリ予測を促進する相乗効果を達成する。さらに、深度と法線の手がかりを統合するために近接ベースのメッシュ条件付けを導入し、点群間を補間し、誤って予測されたジオメトリが生成プロセスに影響を与えないようにフィルタリングする。実験的に、本手法は、未見のシーンにわたる画像とジオメトリの両方において高忠実度の外挿的視点合成を達成し、補間設定下で競争力のある再構成品質を提供し、包括的な3D補完のための幾何学的に整列されたカラー点群を生成する。プロジェクトページはhttps://cvlab-kaist.github.io/MoAIで公開されている。

English

We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology. Unlike prior methods that require dense posed images or pose-embedded generative models limited to in-domain views, our method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images, and formulates novel-view synthesis as an inpainting task for both image and geometry. To ensure accurate alignment between generated images and geometry, we propose cross-modal attention distillation, where attention maps from the image diffusion branch are injected into a parallel geometry diffusion branch during both training and inference. This multi-task approach achieves synergistic effects, facilitating geometrically robust image synthesis as well as well-defined geometry prediction. We further introduce proximity-based mesh conditioning to integrate depth and normal cues, interpolating between point cloud and filtering erroneously predicted geometry from influencing the generation process. Empirically, our method achieves high-fidelity extrapolative view synthesis on both image and geometry across a range of unseen scenes, delivers competitive reconstruction quality under interpolation settings, and produces geometrically aligned colored point clouds for comprehensive 3D completion. Project page is available at https://cvlab-kaist.github.io/MoAI.