CAD-MLLM: MLLMを用いたマルチモダリティ条件付きCAD生成の統合

要旨

本論文の目的は、ユーザーの入力に基づいてCADモデルを簡単に生成できる統合型コンピュータ支援設計（CAD）生成システムを設計することです。ユーザーの入力は、テキストの説明、画像、点群、またはそれらの組み合わせの形式で行われます。この目標に向けて、私たちはCAD-MLLMを導入します。これは、マルチモーダル入力に応じてパラメトリックCADモデルを生成できる初のシステムです。具体的には、CAD-MLLMフレームワーク内で、CADモデルのコマンドシーケンスを活用し、多様なマルチモーダリティデータとCADモデルのベクトル化された表現との特徴空間を整列させるために、先進的な大規模言語モデル（LLM）を使用します。モデルのトレーニングを容易にするために、各CADモデルに対応するマルチモーダルデータを備えた包括的なデータ構築および注釈パイプラインを設計します。私たちの結果として得られたOmni-CADというデータセットは、テキストの説明、多視点画像、点、および各CADモデルのコマンドシーケンスを含む初のマルチモーダルCADデータセットです。約45万のインスタンスとそれらのCAD構築シーケンスが含まれています。私たちが生成したCADモデルの品質を徹底的に評価するために、再構築品質に焦点を当てた現行の評価メトリックを超えて、トポロジー品質と表面包含範囲を評価する追加のメトリックを導入します。広範な実験結果は、CAD-MLLMが既存の条件付き生成手法を大幅に凌駕し、ノイズや欠損点に対して高い堅牢性を維持していることを示しています。プロジェクトページやさらなる可視化情報は、以下のリンクからご覧いただけます：https://cad-mllm.github.io/

English

This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/

CAD-MLLM: MLLMを用いたマルチモダリティ条件付きCAD生成の統合

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

要旨

Support