万物定向V2：统一朝向与旋转理解

摘要

本研究推出Orient Anything V2，这是一个增强型基础模型，用于从单张或配对图像中统一理解物体的三维朝向与旋转。相较于通过单一独特正面定义朝向的V1版本，V2扩展了能力边界：既能处理具有不同旋转对称性的物体，又能直接估计相对旋转。这一突破得益于四项关键创新：1）利用生成模型合成可扩展的3D资产，确保类别覆盖广度与数据分布均衡性；2）采用高效的模型在环标注系统，鲁棒识别每个物体0到N个有效正面；3）设计对称感知的周期性分布拟合目标，捕捉所有合理正面朝向，精准建模物体旋转对称性；4）构建多帧架构直接预测物体相对旋转。大量实验表明，Orient Anything V2在11个主流基准测试中，于朝向估计、六自由度姿态估计和物体对称性识别任务上均实现零样本状态最优性能。该模型展现出强大泛化能力，显著拓宽了朝向估计在多样化下游任务中的适用边界。

English

This work presents Orient Anything V2, an enhanced foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Building upon Orient Anything V1, which defines orientation via a single unique front face, V2 extends this capability to handle objects with diverse rotational symmetries and directly estimate relative rotations. These improvements are enabled by four key innovations: 1) Scalable 3D assets synthesized by generative models, ensuring broad category coverage and balanced data distribution; 2) An efficient, model-in-the-loop annotation system that robustly identifies 0 to N valid front faces for each object; 3) A symmetry-aware, periodic distribution fitting objective that captures all plausible front-facing orientations, effectively modeling object rotational symmetry; 4) A multi-frame architecture that directly predicts relative object rotations. Extensive experiments show that Orient Anything V2 achieves state-of-the-art zero-shot performance on orientation estimation, 6DoF pose estimation, and object symmetry recognition across 11 widely used benchmarks. The model demonstrates strong generalization, significantly broadening the applicability of orientation estimation in diverse downstream tasks.