BRepCLIP: CAD理解のためのBRepプリミティブに対する対照的マルチモーダル事前学習

要旨

CADモデルの表現学習は、多くの未解決問題を抱える分野である。3D表現学習は点群やメッシュを中心に発展してきたが、CADのネイティブ形式である境界表現（BReps）は、正確なパラメトリック曲面、曲線、およびそれらの位相を符号化するにもかかわらず、表現学習の基盤としてほとんど注目されてこなかった。本稿では、対照事前学習を通じてBRep形状を言語および画像埋め込みと整合させる最初のフレームワークであるBRepCLIPを紹介する。各CADオブジェクトを、フェイスとエッジのトークン列としてモデル化し、曲面および曲線形状に対する個別の離散的語彙と、曲面タイプ（例：円筒、トーラス、NURBS）や曲線プリミティブ（例：直線、円弧、Bスプライン）を捉える空間的・意味的記述子を組み合わせる。トランスフォーマーエンコーダがこれらのトークンを集約してグローバルなBRep埋め込みを生成し、共同対照目的によってCLIPのテキストエンコーダおよび画像エンコーダと整合させる。BRepCLIPは、既存のポイントベースの代替手法よりも識別性が高く意味的に基づいた埋め込みを生成し、ABC、CADParser、AutomateデータセットにおいてOpenShapeに対するTop-1検索をそれぞれ40.4%、22.0%、23.9%向上させ、FabWaveにおけるゼロショット分類のTop-1スコアを15%改善する。さらに、テキストおよび画像条件付きCAD生成を評価するためのCAD認識類似度指標としての有用性を実証し、マルチモーダルCAD理解における構造認識事前学習の重要性を確立する。プロジェクトページはhttps://muhammadusama100.github.io/BrepClip2026/で公開されている。

English

Learning representations of CAD models is a largely open problem. While 3D representation learning has flourished around point clouds and meshes, the native format of CAD - boundary representations BReps, which encodes exact parametric surfaces, curves, and their topology, has received little attention as a representation learning substrate. We introduce BRepCLIP, the first framework to align BRep geometry with language and image embeddings through contrastive pretraining. We model each CAD object as a sequence of face and edge tokens with separate discrete vocabularies for surface and curve geometry, augmented with spatial and semantic descriptors that capture surface types (e.g., cylindrical, torus, NURBS) and curve primitives (e.g., line, arc, B-spline). A transformer encoder aggregates these tokens into a global BRep embedding, aligned with CLIP's text and image encoders via a joint contrastive objective. BRepCLIP generates more discriminative and semantically grounded embeddings than existing point-based alternatives, improving Top-1 retrieval over OpenShape by 40.4%, 22.0%, and 23.9% on ABC, CADParser, and Automate, respectively, and improving zero-shot classification on FabWave by 15% in Top-1 score. We further demonstrate its utility as a CAD-aware similarity metric for evaluating text and image-conditioned CAD generation, establishing the importance of structure-aware pretraining for multimodal CAD understanding. Project page is available at https://muhammadusama100.github.io/BrepClip2026/