CLIPSym：基于CLIP的对称性检测深入探索

摘要

对称性是计算机视觉中最基础的几何线索之一，其检测一直是一项持续的挑战。随着视觉-语言模型（如CLIP）的最新进展，我们探讨了预训练的CLIP模型是否能够通过利用自然图像描述中的额外对称线索来辅助对称检测。我们提出了CLIPSym，它结合了CLIP的图像与语言编码器，以及一个基于Transformer与G-卷积混合架构的旋转等变解码器，用于检测旋转与反射对称性。为了充分利用CLIP的语言编码器，我们开发了一种新颖的提示技术——语义感知提示分组（SAPG），该技术通过聚合一组多样化的常见物体基础提示，更好地整合语义线索以进行对称检测。实验表明，CLIPSym在三个标准对称检测数据集（DENDI、SDRW和LDRS）上均超越了当前的最先进方法。最后，我们进行了详细的消融实验，验证了CLIP预训练、所提出的等变解码器及SAPG技术的优势。代码已发布于https://github.com/timyoung2333/CLIPSym。

English

Symmetry is one of the most fundamental geometric cues in computer vision, and detecting it has been an ongoing challenge. With the recent advances in vision-language models,~i.e., CLIP, we investigate whether a pre-trained CLIP model can aid symmetry detection by leveraging the additional symmetry cues found in the natural image descriptions. We propose CLIPSym, which leverages CLIP's image and language encoders and a rotation-equivariant decoder based on a hybrid of Transformer and G-Convolution to detect rotation and reflection symmetries. To fully utilize CLIP's language encoder, we have developed a novel prompting technique called Semantic-Aware Prompt Grouping (SAPG), which aggregates a diverse set of frequent object-based prompts to better integrate the semantic cues for symmetry detection. Empirically, we show that CLIPSym outperforms the current state-of-the-art on three standard symmetry detection datasets (DENDI, SDRW, and LDRS). Finally, we conduct detailed ablations verifying the benefits of CLIP's pre-training, the proposed equivariant decoder, and the SAPG technique. The code is available at https://github.com/timyoung2333/CLIPSym.

CLIPSym：基于CLIP的对称性检测深入探索

CLIPSym: Delving into Symmetry Detection with CLIP

摘要

Support