ChatPaper.aiChatPaper

CAD-Tokenizer:面向基于文本的CAD原型设计,通过模态特定分词实现

CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization

September 25, 2025
作者: Ruiyu Wang, Shizhao Sun, Weijian Ma, Jiang Bian
cs.AI

摘要

计算机辅助设计(CAD)是工业原型制作的基础组成部分,其模型并非通过原始坐标定义,而是通过诸如草图和拉伸等构建序列来定义。这种序列化结构既支持高效的原型初始化,也便于后续编辑。文本引导的CAD原型制作,将文本到CAD生成与CAD编辑统一起来,有望简化整个设计流程。然而,先前的研究尚未深入探索这一领域,主要原因在于标准的大语言模型(LLM)分词器将CAD序列分解为自然语言词汇片段,未能捕捉到CAD的基元级语义,阻碍了注意力模块对几何结构的建模。我们推测,一种与CAD基元及结构特性相匹配的多模态分词策略,能够提供更为有效的表示方法。为此,我们提出了CAD-Tokenizer框架,该框架利用基于序列的VQ-VAE,结合基元级池化和约束解码,以模态特定的令牌表示CAD数据。这一设计生成了紧凑且具备基元意识的表示,与CAD的结构特性相契合。应用于统一的文本引导CAD原型制作时,CAD-Tokenizer显著提升了指令遵循和生成质量,在定量和定性评估上均优于通用大语言模型及特定任务基线。
English
Computer-Aided Design (CAD) is a foundational component of industrial prototyping, where models are defined not by raw coordinates but by construction sequences such as sketches and extrusions. This sequential structure enables both efficient prototype initialization and subsequent editing. Text-guided CAD prototyping, which unifies Text-to-CAD generation and CAD editing, has the potential to streamline the entire design pipeline. However, prior work has not explored this setting, largely because standard large language model (LLM) tokenizers decompose CAD sequences into natural-language word pieces, failing to capture primitive-level CAD semantics and hindering attention modules from modeling geometric structure. We conjecture that a multimodal tokenization strategy, aligned with CAD's primitive and structural nature, can provide more effective representations. To this end, we propose CAD-Tokenizer, a framework that represents CAD data with modality-specific tokens using a sequence-based VQ-VAE with primitive-level pooling and constrained decoding. This design produces compact, primitive-aware representations that align with CAD's structural nature. Applied to unified text-guided CAD prototyping, CAD-Tokenizer significantly improves instruction following and generation quality, achieving better quantitative and qualitative performance over both general-purpose LLMs and task-specific baselines.
PDF12September 29, 2025