BRepCLIP: CAD 이해를 위한 BRep 프리미티브에 대한 대조적 멀티모달 사전 학습

초록

CAD 모델의 표현 학습은 대부분 미해결 문제로 남아 있다. 3D 표현 학습이 점군과 메시를 중심으로 활발히 진행되어 온 반면, CAD의 고유 형식인 경계 표현(BRep)은 정확한 매개변수 표면, 곡선 및 이들의 위상을 인코딩함에도 불구하고 표현 학습 기반으로서 거의 주목받지 못했다. 본 연구에서는 대조 사전 학습을 통해 BRep 기하 구조를 언어 및 이미지 임베딩과 정렬하는 최초의 프레임워크인 BRepCLIP을 소개한다. 각 CAD 객체를 면과 모서리 토큰의 시퀀스로 모델링하며, 표면 및 곡선 기하 구조에 대해 별도의 이산 어휘를 사용하고, 표면 유형(예: 원통형, 토러스, NURBS)과 곡선 프리미티브(예: 직선, 호, B-스플라인)를 포착하는 공간 및 의미적 설명자를 추가한다. 트랜스포머 인코더가 이러한 토큰을 집계하여 전역 BRep 임베딩을 생성하며, 이는 공동 대조 목적 함수를 통해 CLIP의 텍스트 및 이미지 인코더와 정렬된다. BRepCLIP은 기존의 점 기반 대안보다 더 변별력 있고 의미적으로 근거 있는 임베딩을 생성하여, ABC, CADParser, Automate 데이터셋에서 OpenShape 대비 Top-1 검색 성능을 각각 40.4%, 22.0%, 23.9% 향상시키고, FabWave에서 제로샷 분류의 Top-1 정확도를 15% 개선한다. 또한, 텍스트 및 이미지 조건부 CAD 생성을 평가하기 위한 CAD 인식 유사도 지표로서의 유용성을 입증하여, 다중 모드 CAD 이해를 위한 구조 인식 사전 학습의 중요성을 확인한다. 프로젝트 페이지는 https://muhammadusama100.github.io/BrepClip2026/에서 확인할 수 있다.

English

Learning representations of CAD models is a largely open problem. While 3D representation learning has flourished around point clouds and meshes, the native format of CAD - boundary representations BReps, which encodes exact parametric surfaces, curves, and their topology, has received little attention as a representation learning substrate. We introduce BRepCLIP, the first framework to align BRep geometry with language and image embeddings through contrastive pretraining. We model each CAD object as a sequence of face and edge tokens with separate discrete vocabularies for surface and curve geometry, augmented with spatial and semantic descriptors that capture surface types (e.g., cylindrical, torus, NURBS) and curve primitives (e.g., line, arc, B-spline). A transformer encoder aggregates these tokens into a global BRep embedding, aligned with CLIP's text and image encoders via a joint contrastive objective. BRepCLIP generates more discriminative and semantically grounded embeddings than existing point-based alternatives, improving Top-1 retrieval over OpenShape by 40.4%, 22.0%, and 23.9% on ABC, CADParser, and Automate, respectively, and improving zero-shot classification on FabWave by 15% in Top-1 score. We further demonstrate its utility as a CAD-aware similarity metric for evaluating text and image-conditioned CAD generation, establishing the importance of structure-aware pretraining for multimodal CAD understanding. Project page is available at https://muhammadusama100.github.io/BrepClip2026/