cadrille：基于在线强化学习的多模态CAD重建

摘要

计算机辅助设计（CAD）在工程与制造领域占据核心地位，使得创建精确且可编辑的三维模型成为可能。利用多种传感器或用户提供的数据作为CAD重建的输入，能够普及设计应用的使用。然而，现有方法通常仅聚焦于单一输入模态，如点云、图像或文本，这限制了其通用性和鲁棒性。借助视觉-语言模型（VLM）的最新进展，我们提出了一种多模态CAD重建模型，能够同时处理上述三种输入模态。受大型语言模型（LLM）训练范式的启发，我们采用了两阶段流程：首先在大规模程序生成的数据上进行监督微调（SFT），随后利用程序化获取的在线反馈进行强化学习（RL）微调。此外，我们首次探索了将LLM通过RL微调应用于CAD任务，证明了如群体相对偏好优化（GRPO）等在线RL算法优于离线替代方案。在DeepCAD基准测试中，我们的SFT模型在所有三种输入模态上均超越了现有的单模态方法。更重要的是，经过RL微调后，cadrille在包括一个真实世界数据集在内的三个具有挑战性的数据集上，均创下了新的技术标杆。

English

Computer-Aided Design (CAD) plays a central role in engineering and manufacturing, making it possible to create precise and editable 3D models. Using a variety of sensor or user-provided data as inputs for CAD reconstruction can democratize access to design applications. However, existing methods typically focus on a single input modality, such as point clouds, images, or text, which limits their generalizability and robustness. Leveraging recent advances in vision-language models (VLM), we propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities. Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically. Furthermore, we are the first to explore RL fine-tuning of LLMs for CAD tasks demonstrating that online RL algorithms such as Group Relative Preference Optimization (GRPO) outperform offline alternatives. In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously. More importantly, after RL fine-tuning, cadrille sets new state-of-the-art on three challenging datasets, including a real-world one.

cadrille：基于在线强化学习的多模态CAD重建

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

摘要

Support