区别对待运动组件:推进关节深度与自运动学习的演化
Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning
November 3, 2025
作者: Mengtan Zhang, Zizhan Guo, Hongbo Zhao, Yi Feng, Zuyi Xiong, Yue Wang, Shaoyi Du, Hanli Wang, Rui Fan
cs.AI
摘要
近年来,深度与自身运动这两个基础三维感知任务的無监督学习取得了显著进展。然而多数方法将自身运动视为辅助任务,要么混合所有运动类型,要么在监督中排除与深度无关的旋转运动。此类设计限制了强几何约束的引入,降低了多变条件下的可靠性与鲁棒性。本研究提出对运动成分的判别式处理,利用其各自刚性光流的几何规律性来提升深度与自身运动估计性能。给定连续视频帧,网络输出首先对齐源相机与目标相机的光轴和成像平面。通过这些对齐变换将帧间光流进行转换,并通过量化偏差对每个自身运动分量单独施加几何约束,从而实现更具针对性的优化。这些对齐操作进一步将联合学习过程重构为共轴与共面形式,通过闭式几何关系实现深度与各平移分量的相互推导,引入互补约束以提升深度鲁棒性。融入这些设计的通用型深度-自身运动联合学习框架DiMoDE,在多个公开数据集及新采集的多样化真实场景数据集上实现了最先进性能,尤其在挑战性环境下表现突出。我们的源代码将在论文发表后公开于mias.group/DiMoDE。
English
Unsupervised learning of depth and ego-motion, two fundamental 3D perception
tasks, has made significant strides in recent years. However, most methods
treat ego-motion as an auxiliary task, either mixing all motion types or
excluding depth-independent rotational motions in supervision. Such designs
limit the incorporation of strong geometric constraints, reducing reliability
and robustness under diverse conditions. This study introduces a discriminative
treatment of motion components, leveraging the geometric regularities of their
respective rigid flows to benefit both depth and ego-motion estimation. Given
consecutive video frames, network outputs first align the optical axes and
imaging planes of the source and target cameras. Optical flows between frames
are transformed through these alignments, and deviations are quantified to
impose geometric constraints individually on each ego-motion component,
enabling more targeted refinement. These alignments further reformulate the
joint learning process into coaxial and coplanar forms, where depth and each
translation component can be mutually derived through closed-form geometric
relationships, introducing complementary constraints that improve depth
robustness. DiMoDE, a general depth and ego-motion joint learning framework
incorporating these designs, achieves state-of-the-art performance on multiple
public datasets and a newly collected diverse real-world dataset, particularly
under challenging conditions. Our source code will be publicly available at
mias.group/DiMoDE upon publication.