ChatPaper.aiChatPaper

DeFM:从深度信息中学习机器人技术的基础表征

DeFM: Learning Foundation Representations from Depth for Robotics

January 26, 2026
作者: Manthan Patel, Jonas Frey, Mayank Mittal, Fan Yang, Alexander Hansson, Amir Bar, Cesar Cadena, Marco Hutter
cs.AI

摘要

深度传感器已在各类机器人平台广泛部署,而快速高保真深度模拟技术的进步使得基于深度观测训练的机器人策略能够在多种任务中实现稳健的仿真到现实迁移。尽管如此,与已由大规模基础模型定义技术前沿的RGB模态相比,深度模态的表征学习仍处于探索不足的状态。为弥补这一空白,我们提出DeFM——一种专为机器人应用完全基于深度图像训练的自监督基础模型。通过在精选的6000万张深度图像数据集上采用DINO风格的自蒸馏目标,DeFM能够学习可泛化至不同环境、任务和传感器的几何与语义表征。为在多尺度下保持度量感知能力,我们引入了新颖的输入归一化策略。进一步将DeFM蒸馏为适用于资源受限机器人系统的紧凑模型。在基于深度的分类、分割、导航、运动与操作基准测试中,DeFM实现了最先进的性能,并展现出从仿真到真实环境的强大泛化能力。我们开源所有预训练模型,这些模型可直接用于基于深度的机器人学习而无需任务特定微调。项目页面:https://de-fm.github.io/
English
Depth sensors are widely deployed across robotic platforms, and advances in fast, high-fidelity depth simulation have enabled robotic policies trained on depth observations to achieve robust sim-to-real transfer for a wide range of tasks. Despite this, representation learning for depth modality remains underexplored compared to RGB, where large-scale foundation models now define the state of the art. To address this gap, we present DeFM, a self-supervised foundation model trained entirely on depth images for robotic applications. Using a DINO-style self-distillation objective on a curated dataset of 60M depth images, DeFM learns geometric and semantic representations that generalize to diverse environments, tasks, and sensors. To retain metric awareness across multiple scales, we introduce a novel input normalization strategy. We further distill DeFM into compact models suitable for resource-constrained robotic systems. When evaluated on depth-based classification, segmentation, navigation, locomotion, and manipulation benchmarks, DeFM achieves state-of-the-art performance and demonstrates strong generalization from simulation to real-world environments. We release all our pretrained models, which can be adopted off-the-shelf for depth-based robotic learning without task-specific fine-tuning. Webpage: https://de-fm.github.io/
PDF11January 29, 2026