ChatPaper.aiChatPaper

透過在大型語言模型中注入視覺反饋來生成文本到CAD的過程

Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models

January 31, 2025
作者: Ruiyu Wang, Yu Yuan, Shizhao Sun, Jiang Bian
cs.AI

摘要

創建計算機輔助設計(CAD)模型需要豐富的專業知識和努力。將文本轉換為CAD參數序列的文本至CAD技術,在簡化這個過程中至關重要。最近的研究利用地面真實參數序列,稱為序列信號,作為監督以實現此目標。然而,CAD模型在本質上是多模態的,包括參數序列和相應的渲染視覺對象。此外,從參數序列到視覺對象的渲染過程是多對一的。因此,序列信號和視覺信號對於有效訓練至關重要。在這項工作中,我們介紹CADFusion,一個使用大型語言模型(LLMs)作為骨幹,並在兩個訓練階段之間交替的框架:序列學習(SL)階段和視覺反饋(VF)階段。在SL階段,我們使用地面真實參數序列訓練LLMs,從而使其能夠生成邏輯上連貫的參數序列。在VF階段,我們獎勵將渲染為視覺上首選對象的參數序列,並懲罰那些不能,使LLMs能夠學習如何感知和評估渲染的視覺對象。這兩個階段在整個訓練過程中交替進行,確保平衡學習並保留兩種信號的優勢。實驗表明,CADFusion在質量和量化方面顯著提高了性能。
English
Creating Computer-Aided Design (CAD) models requires significant expertise and effort. Text-to-CAD, which converts textual descriptions into CAD parametric sequences, is crucial in streamlining this process. Recent studies have utilized ground-truth parametric sequences, known as sequential signals, as supervision to achieve this goal. However, CAD models are inherently multimodal, comprising parametric sequences and corresponding rendered visual objects. Besides,the rendering process from parametric sequences to visual objects is many-to-one. Therefore, both sequential and visual signals are critical for effective training. In this work, we introduce CADFusion, a framework that uses Large Language Models (LLMs) as the backbone and alternates between two training stages: the sequential learning (SL) stage and the visual feedback (VF) stage. In the SL stage, we train LLMs using ground-truth parametric sequences, enabling the generation of logically coherent parametric sequences. In the VF stage, we reward parametric sequences that render into visually preferred objects and penalize those that do not, allowing LLMs to learn how rendered visual objects are perceived and evaluated. These two stages alternate throughout the training, ensuring balanced learning and preserving benefits of both signals. Experiments demonstrate that CADFusion significantly improves performance, both qualitatively and quantitatively.

Summary

AI-Generated Summary

PDF102February 6, 2025