ChatPaper.aiChatPaper

MARCO:探索语义对应无形空间的导航之旅

MARCO: Navigating the Unseen Space of Semantic Correspondence

April 20, 2026
作者: Claudia Cuttano, Gabriele Trivigno, Carlo Masone, Stefan Roth
cs.AI

摘要

近年来语义对应技术的进展主要依赖于双编码器架构,即将DINOv2与扩散模型主干网络相结合。尽管这些拥有数十亿参数的模型精度较高,但其在训练关键点之外的泛化能力较弱,这暴露出基准测试性能与实际应用需求之间的差距——实际查询点很少与训练时所见点完全匹配。基于DINOv2,我们提出了MARCO这一通用化对应关系的统一模型,其创新训练框架可同时提升细粒度定位能力和语义泛化能力。通过结合从粗到精的空间精度优化目标与自蒸馏框架(将稀疏监督扩展至标注区域之外),我们的方法能够将少量关键点转化为密集且语义一致的对应关系。MARCO在SPair-71k、AP-10K和PF-PASCAL数据集上实现了最新最优性能,在细粒度定位阈值(PCK@0.01提升8.9%)、未见关键点泛化(SPair-U提升5.1%)和跨类别泛化(MP-100提升4.7%)方面表现尤为突出,同时模型体积比基于扩散的方法缩小3倍,速度提升10倍。代码已开源:https://github.com/visinf/MARCO。
English
Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond training keypoints, revealing a gap between benchmark performance and real-world usability, where queried points rarely match those seen during training. Building upon DINOv2, we introduce MARCO, a unified model for generalizable correspondence driven by a novel training framework that enhances both fine-grained localization and semantic generalization. By coupling a coarse-to-fine objective that refines spatial precision with a self-distillation framework, which expands sparse supervision beyond annotated regions, our approach transforms a handful of keypoints into dense, semantically coherent correspondences. MARCO sets a new state of the art on SPair-71k, AP-10K, and PF-PASCAL, with gains that amplify at fine-grained localization thresholds (+8.9 PCK@0.01), strongest generalization to unseen keypoints (+5.1, SPair-U) and categories (+4.7, MP-100), while remaining 3x smaller and 10x faster than diffusion-based approaches. Code is available at https://github.com/visinf/MARCO .
PDF12April 22, 2026