BRepCLIP: 基于边界表示基元的对比多模态预训练用于CAD理解
BRepCLIP: Contrastive Multimodal Pretraining on BRep Primitives for CAD Understanding
June 3, 2026
作者: Muhammad Usama, Didier Stricker, Mohammad Sadil Khan, Muhammad Zeshan Afzal
cs.AI
摘要
学习CAD模型的表示在很大程度上是一个尚未解决的问题。尽管三维表示学习已围绕点云和网格蓬勃发展,但CAD的原生格式——边界表示(BReps),即编码精确参数化曲面、曲线及其拓扑结构的方式,作为表示学习的基底却鲜受关注。我们提出BRepCLIP,这是首个通过对比预训练将BRep几何与语言及图像嵌入对齐的框架。我们将每个CAD对象建模为一系列面片和边标记序列,并分别为曲面和曲线几何建立独立的离散词汇表,同时辅以捕捉曲面类型(如圆柱面、圆环面、NURBS曲面)和曲线基元(如直线、圆弧、B样条曲线)的空间与语义描述符。一个Transformer编码器将这些标记聚合为全局BRep嵌入,并通过联合对比目标与CLIP的文本和图像编码器对齐。BRepCLIP生成的嵌入比现有基于点云的替代方法更具判别性和语义基础,在ABC、CADParser和Automate数据集上,Top-1检索性能相较于OpenShape分别提升40.4%、22.0%和23.9%,并在FabWave数据集上的零样本分类Top-1得分提升15%。我们进一步展示了其作为CAD感知相似度度量在评估文本和图像条件CAD生成中的实用性,突显了结构感知预训练对于多模态CAD理解的重要性。项目页面见https://muhammadusama100.github.io/BrepClip2026/。
English
Learning representations of CAD models is a largely open problem. While 3D representation learning has flourished around point clouds and meshes, the native format of CAD - boundary representations BReps, which encodes exact parametric surfaces, curves, and their topology, has received little attention as a representation learning substrate. We introduce BRepCLIP, the first framework to align BRep geometry with language and image embeddings through contrastive pretraining. We model each CAD object as a sequence of face and edge tokens with separate discrete vocabularies for surface and curve geometry, augmented with spatial and semantic descriptors that capture surface types (e.g., cylindrical, torus, NURBS) and curve primitives (e.g., line, arc, B-spline). A transformer encoder aggregates these tokens into a global BRep embedding, aligned with CLIP's text and image encoders via a joint contrastive objective. BRepCLIP generates more discriminative and semantically grounded embeddings than existing point-based alternatives, improving Top-1 retrieval over OpenShape by 40.4%, 22.0%, and 23.9% on ABC, CADParser, and Automate, respectively, and improving zero-shot classification on FabWave by 15% in Top-1 score. We further demonstrate its utility as a CAD-aware similarity metric for evaluating text and image-conditioned CAD generation, establishing the importance of structure-aware pretraining for multimodal CAD understanding. Project page is available at https://muhammadusama100.github.io/BrepClip2026/