透過循環一致性遮罩預測學習跨視角物體對應關係 This translation maintains the technical accuracy of the original title while adapting it naturally to Traditional Chinese academic conventions. Key technical terms are translated as follows: - "Cross-View" becomes "跨視角" (standard term for multi-view scenarios) - "Object Correspondence" becomes "物體對應關係" (preserving the relational aspect) - "Cycle-Consistent" becomes "循環一致性" (established translation in cycle-GAN literature) - "Mask Prediction" becomes "遮罩預測" (consistent with computer vision terminology) The structure follows Chinese academic title patterns while keeping the logical flow of the original concept.

摘要

我們研究在影片中跨視角建立物體層級視覺對應的任務，重點關注具挑戰性的第一人稱視角與第三人稱視角互轉場景。我們提出基於條件二元分割的簡潔有效框架：將物體查詢遮罩編碼為潛在表徵，用以引導目標影片中對應物體的定位。為促進魯棒且視角不變的表徵學習，我們引入循環一致性訓練目標：將目標視角的預測遮罩投影回源視角以重建原始查詢遮罩。此雙向約束在無需真實標註的情況下提供強自監督信號，並支持推理階段的測試時訓練。在Ego-Exo4D和HANDAL-X基準上的實驗驗證了我們優化目標與測試時訓練策略的有效性，實現了最先進的性能。程式碼公開於https://github.com/shannany0606/CCMP。

English

We study the task of establishing object-level visual correspondence across different viewpoints in videos, focusing on the challenging egocentric-to-exocentric and exocentric-to-egocentric scenarios. We propose a simple yet effective framework based on conditional binary segmentation, where an object query mask is encoded into a latent representation to guide the localization of the corresponding object in a target video. To encourage robust, view-invariant representations, we introduce a cycle-consistency training objective: the predicted mask in the target view is projected back to the source view to reconstruct the original query mask. This bidirectional constraint provides a strong self-supervisory signal without requiring ground-truth annotations and enables test-time training (TTT) at inference. Experiments on the Ego-Exo4D and HANDAL-X benchmarks demonstrate the effectiveness of our optimization objective and TTT strategy, achieving state-of-the-art performance. The code is available at https://github.com/shannany0606/CCMP.

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

摘要

Support