Stroke3D:基于隐扩散模型的二维笔触到绑定三维模型生成技术
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
February 10, 2026
作者: Ruisi Zhao, Haoren Zheng, Zongxin Yang, Hehe Fan, Yi Yang
cs.AI
摘要
可绑定骨骼的3D资产是实现三维形变与动画的基础。然而,现有3D生成方法在创建可动画几何体方面面临挑战,而骨骼绑定技术又缺乏对骨架生成的细粒度结构控制。为突破这些局限,我们提出Stroke3D——一个能够根据用户输入(二维手绘草图和描述性文本提示)直接生成带绑定网格的创新框架。该框架首创两阶段生成流程:1)可控骨架生成阶段,我们采用骨骼图变分自编码器(Sk-VAE)将骨架图结构编码至潜空间,由骨骼图扩散变换器(Sk-DiT)生成骨骼嵌入。该生成过程同时受文本语义与二维草图的结构控制约束,再通过VAE解码器重建出高质量三维骨架;2)基于TextuRig与SKA-DPO的增强网格合成阶段,我们在生成骨架上合成带贴图的网格。此阶段首先通过TextuRig(从Objaverse-XL精选的带标注文本的贴图绑定网格数据集)增强现有骨架到网格模型的训练数据,进而采用基于骨架-网格对齐度评估的偏好优化策略SKA-DPO来提升几何保真度。本框架共同实现了更直观的"动画就绪"3D内容创作流程。据我们所知,这是首个基于用户二维草图生成可绑定3D网格的工作。大量实验表明,Stroke3D能生成结构合理的骨架与高质量的网格模型。
English
Rigged 3D assets are fundamental to 3D deformation and animation. However, existing 3D generation methods face challenges in generating animatable geometry, while rigging techniques lack fine-grained structural control over skeleton creation. To address these limitations, we introduce Stroke3D, a novel framework that directly generates rigged meshes from user inputs: 2D drawn strokes and a descriptive text prompt. Our approach pioneers a two-stage pipeline that separates the generation into: 1) Controllable Skeleton Generation, we employ the Skeletal Graph VAE (Sk-VAE) to encode the skeleton's graph structure into a latent space, where the Skeletal Graph DiT (Sk-DiT) generates a skeletal embedding. The generation process is conditioned on both the text for semantics and the 2D strokes for explicit structural control, with the VAE's decoder reconstructing the final high-quality 3D skeleton; and 2) Enhanced Mesh Synthesis via TextuRig and SKA-DPO, where we then synthesize a textured mesh conditioned on the generated skeleton. For this stage, we first enhance an existing skeleton-to-mesh model by augmenting its training data with TextuRig: a dataset of textured and rigged meshes with captions, curated from Objaverse-XL. Additionally, we employ a preference optimization strategy, SKA-DPO, guided by a skeleton-mesh alignment score, to further improve geometric fidelity. Together, our framework enables a more intuitive workflow for creating ready to animate 3D content. To the best of our knowledge, our work is the first to generate rigged 3D meshes conditioned on user-drawn 2D strokes. Extensive experiments demonstrate that Stroke3D produces plausible skeletons and high-quality meshes.