Easi3R: 学習なしでDUSt3Rから分離された動きを推定

要旨

DUSt3Rの最近の進展により、Transformerネットワークアーキテクチャと大規模3Dデータセットに対する直接的な教師あり学習を活用して、静的なシーンの密な点群とカメラパラメータの頑健な推定が可能になりました。一方で、利用可能な4Dデータセットの規模と多様性の限界は、高度に汎用性の高い4Dモデルの訓練における主要なボトルネックとなっています。この制約により、従来の4D手法では、オプティカルフローや深度などの追加の幾何学的な事前情報を用いて、スケーラブルな動的ビデオデータに対して3Dモデルをファインチューニングする必要がありました。本研究では、これとは逆のアプローチを取り、4D再構成のためのシンプルでありながら効率的な訓練不要の手法であるEasi3Rを提案します。我々のアプローチでは、推論中にアテンション適応を適用し、ゼロからの事前訓練やネットワークのファインチューニングを不要とします。DUSt3Rのアテンションレイヤーが、カメラと物体の動きに関する豊富な情報を本質的にエンコードしていることを発見しました。これらのアテンションマップを注意深く分離することで、正確な動的領域セグメンテーション、カメラポーズ推定、および4D密な点マップ再構成を実現します。実世界の動的ビデオに対する広範な実験により、我々の軽量なアテンション適応が、大規模な動的データセットで訓練またはファインチューニングされた従来の最先端手法を大幅に上回ることを示しています。我々のコードは研究目的でhttps://easi3r.github.io/に公開されています。

English

Recent advances in DUSt3R have enabled robust estimation of dense point clouds and camera parameters of static scenes, leveraging Transformer network architectures and direct supervision on large-scale 3D datasets. In contrast, the limited scale and diversity of available 4D datasets present a major bottleneck for training a highly generalizable 4D model. This constraint has driven conventional 4D methods to fine-tune 3D models on scalable dynamic video data with additional geometric priors such as optical flow and depths. In this work, we take an opposite path and introduce Easi3R, a simple yet efficient training-free method for 4D reconstruction. Our approach applies attention adaptation during inference, eliminating the need for from-scratch pre-training or network fine-tuning. We find that the attention layers in DUSt3R inherently encode rich information about camera and object motion. By carefully disentangling these attention maps, we achieve accurate dynamic region segmentation, camera pose estimation, and 4D dense point map reconstruction. Extensive experiments on real-world dynamic videos demonstrate that our lightweight attention adaptation significantly outperforms previous state-of-the-art methods that are trained or finetuned on extensive dynamic datasets. Our code is publicly available for research purpose at https://easi3r.github.io/

Easi3R: 学習なしでDUSt3Rから分離された動きを推定

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training

要旨

Support