cadrille: オンライン強化学習を用いたマルチモーダルCAD再構築

要旨

コンピュータ支援設計（CAD）は、精密で編集可能な3Dモデルを作成することを可能にし、エンジニアリングと製造において中心的な役割を果たしています。センサーやユーザー提供のデータをCAD再構築の入力として使用することで、設計アプリケーションへのアクセスを民主化することができます。しかし、既存の手法は通常、点群、画像、テキストなどの単一の入力モダリティに焦点を当てており、その汎用性と堅牢性が制限されています。視覚言語モデル（VLM）の最近の進展を活用し、私たちは3つの入力モダリティを同時に処理するマルチモーダルCAD再構築モデルを提案します。大規模言語モデル（LLM）のトレーニングパラダイムに着想を得て、私たちは2段階のパイプラインを採用します：大規模な手続き的に生成されたデータでの教師あり微調整（SFT）と、プログラム的に取得されたオンラインフィードバックを使用した強化学習（RL）微調整です。さらに、私たちはCADタスクにおけるLLMのRL微調整を初めて探求し、Group Relative Preference Optimization（GRPO）などのオンラインRLアルゴリズムがオフラインの代替手法を上回ることを実証します。DeepCADベンチマークでは、私たちのSFTモデルが、3つの入力モダリティすべてにおいて既存の単一モーダルアプローチを同時に上回りました。さらに重要なことに、RL微調整後、cadrilleは3つの挑戦的なデータセット（実世界のデータセットを含む）で新たな最先端の性能を達成しました。

English

Computer-Aided Design (CAD) plays a central role in engineering and manufacturing, making it possible to create precise and editable 3D models. Using a variety of sensor or user-provided data as inputs for CAD reconstruction can democratize access to design applications. However, existing methods typically focus on a single input modality, such as point clouds, images, or text, which limits their generalizability and robustness. Leveraging recent advances in vision-language models (VLM), we propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities. Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically. Furthermore, we are the first to explore RL fine-tuning of LLMs for CAD tasks demonstrating that online RL algorithms such as Group Relative Preference Optimization (GRPO) outperform offline alternatives. In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously. More importantly, after RL fine-tuning, cadrille sets new state-of-the-art on three challenging datasets, including a real-world one.

cadrille: オンライン強化学習を用いたマルチモーダルCAD再構築

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

要旨

Support