CLIPSym:深入探討CLIP在對稱性檢測中的應用
CLIPSym: Delving into Symmetry Detection with CLIP
August 19, 2025
作者: Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh
cs.AI
摘要
對稱性是計算機視覺中最基本的幾何線索之一,而檢測對稱性一直是一個持續的挑戰。隨著視覺-語言模型(如CLIP)的最新進展,我們探討了預訓練的CLIP模型是否能夠通過利用自然圖像描述中的額外對稱線索來輔助對稱性檢測。我們提出了CLIPSym,該方法利用CLIP的圖像和語言編碼器,以及基於Transformer和G-卷積混合的旋轉等變解碼器來檢測旋轉和反射對稱性。為了充分利用CLIP的語言編碼器,我們開發了一種名為語義感知提示分組(SAPG)的新提示技術,該技術聚合了一組多樣的基於物件的常見提示,以更好地整合語義線索進行對稱性檢測。實驗表明,CLIPSym在三個標準的對稱性檢測數據集(DENDI、SDRW和LDRS)上優於當前最先進的方法。最後,我們進行了詳細的消融實驗,驗證了CLIP預訓練、所提出的等變解碼器以及SAPG技術的優勢。代碼可在https://github.com/timyoung2333/CLIPSym 獲取。
English
Symmetry is one of the most fundamental geometric cues in computer vision,
and detecting it has been an ongoing challenge. With the recent advances in
vision-language models,~i.e., CLIP, we investigate whether a pre-trained CLIP
model can aid symmetry detection by leveraging the additional symmetry cues
found in the natural image descriptions. We propose CLIPSym, which leverages
CLIP's image and language encoders and a rotation-equivariant decoder based on
a hybrid of Transformer and G-Convolution to detect rotation and reflection
symmetries. To fully utilize CLIP's language encoder, we have developed a novel
prompting technique called Semantic-Aware Prompt Grouping (SAPG), which
aggregates a diverse set of frequent object-based prompts to better integrate
the semantic cues for symmetry detection. Empirically, we show that CLIPSym
outperforms the current state-of-the-art on three standard symmetry detection
datasets (DENDI, SDRW, and LDRS). Finally, we conduct detailed ablations
verifying the benefits of CLIP's pre-training, the proposed equivariant
decoder, and the SAPG technique. The code is available at
https://github.com/timyoung2333/CLIPSym.