ドッペルゲンガー：類似構造を持つ画像の曖昧性解消を学習する

要旨

視覚的な曖昧さ解消タスクとして、視覚的に類似した画像のペアが同じ3D表面を表しているか、異なる3D表面を表しているか（例えば、対称的な建物の同じ側か反対側か）を判定する問題を考察します。幻想的な画像マッチ、つまり2つの画像が異なるが視覚的に類似した3D表面を観察している場合、人間にとって区別が難しいだけでなく、3D再構成アルゴリズムが誤った結果を生み出す原因にもなり得ます。本論文では、視覚的曖昧さ解消を学習ベースのアプローチで解決し、画像ペアに対する二値分類タスクとして定式化します。そのために、この問題に対する新しいデータセット「Doppelgangers」を導入します。このデータセットには、類似した構造物の画像ペアとその正解ラベルが含まれています。また、局所的なキーポイントとマッチングの空間分布を入力として受け取り、局所的な手がかりとグローバルな手がかりの両方をより良く推論できるネットワークアーキテクチャを設計します。評価の結果、提案手法が困難なケースにおける幻想的なマッチを区別できること、そしてSfMパイプラインに統合して正しく曖昧さを解消した3D再構成を生成できることが示されました。コード、データセット、およびさらなる結果については、プロジェクトページをご覧ください: http://doppelgangers-3d.github.io/。

English

We consider the visual disambiguation task of determining whether a pair of visually similar images depict the same or distinct 3D surfaces (e.g., the same or opposite sides of a symmetric building). Illusory image matches, where two images observe distinct but visually similar 3D surfaces, can be challenging for humans to differentiate, and can also lead 3D reconstruction algorithms to produce erroneous results. We propose a learning-based approach to visual disambiguation, formulating it as a binary classification task on image pairs. To that end, we introduce a new dataset for this problem, Doppelgangers, which includes image pairs of similar structures with ground truth labels. We also design a network architecture that takes the spatial distribution of local keypoints and matches as input, allowing for better reasoning about both local and global cues. Our evaluation shows that our method can distinguish illusory matches in difficult cases, and can be integrated into SfM pipelines to produce correct, disambiguated 3D reconstructions. See our project page for our code, datasets, and more results: http://doppelgangers-3d.github.io/.

ドッペルゲンガー：類似構造を持つ画像の曖昧性解消を学習する

Doppelgangers: Learning to Disambiguate Images of Similar Structures

要旨

Support