カテゴリに依存しないポーズ推定のためのエッジ重み予測

要旨

Category-Agnostic Pose Estimation（CAPE）は、1つまたは数枚の注釈付きサポート画像を使用して、単一のモデルでさまざまなオブジェクトカテゴリにわたるキーポイントを特定します。最近の研究では、ポーズグラフを使用すること（つまり、キーポイントを単独の点ではなくグラフ内のノードとして扱うこと）が、遮蔽物や対称性の問題を解決するのに役立つことが示されています。ただし、これらの手法は、等重みのエッジを持つ静的なポーズグラフを前提としており、最適でない結果につながります。本研究では、エッジの重みを予測して局所化を最適化するEdgeCapeという新しいフレームワークを導入します。さらに、構造的先行情報を活用するために、ノード間のホップ数に基づいて自己注意相互作用を調整するマルコフ構造バイアスの統合を提案します。これにより、モデルがグローバルな空間依存関係を捉える能力が向上します。100のカテゴリと20,000以上の画像を含むMP-100ベンチマークで評価した結果、EdgeCapeは1ショット設定で最先端の結果を達成し、同様の規模の手法の中で5ショット設定でリードし、キーポイントの局所化精度を著しく向上させました。当該コードは公開されています。

English

Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse object categories with a single model, using one or a few annotated support images. Recent works have shown that using a pose graph (i.e., treating keypoints as nodes in a graph rather than isolated points) helps handle occlusions and break symmetry. However, these methods assume a static pose graph with equal-weight edges, leading to suboptimal results. We introduce EdgeCape, a novel framework that overcomes these limitations by predicting the graph's edge weights which optimizes localization. To further leverage structural priors, we propose integrating Markovian Structural Bias, which modulates the self-attention interaction between nodes based on the number of hops between them. We show that this improves the model's ability to capture global spatial dependencies. Evaluated on the MP-100 benchmark, which includes 100 categories and over 20K images, EdgeCape achieves state-of-the-art results in the 1-shot setting and leads among similar-sized methods in the 5-shot setting, significantly improving keypoint localization accuracy. Our code is publicly available.

カテゴリに依存しないポーズ推定のためのエッジ重み予測

Edge Weight Prediction For Category-Agnostic Pose Estimation

要旨

Support