ChatPaper.aiChatPaper

Voxify3D:像素艺术与体素渲染的融合

Voxify3D: Pixel Art Meets Volumetric Rendering

December 8, 2025
作者: Yi-Chuan Huang, Jiewen Chan, Hao-Jen Chien, Yu-Lun Liu
cs.AI

摘要

体素艺术是一种广泛应用于游戏和数字媒体的独特风格化形式,然而从三维网格自动生成体素艺术仍面临几何抽象、语义保持和离散色彩一致性等多重需求的挑战。现有方法要么过度简化几何结构,要么难以实现体素艺术所要求的像素级精准、调色板约束的美学效果。我们提出Voxify3D——一个连接三维网格优化与二维像素艺术监督的可微分两阶段框架。我们的核心创新在于三个组件的协同整合:(1)通过正交像素艺术监督消除透视畸变,实现体素-像素精准对齐;(2)基于图像块的CLIP对齐技术,在离散化过程中保持跨层级语义;(3)支持可控调色板策略的调色板约束型Gumbel-Softmax量化方法,实现离散色彩空间的可微分优化。该框架解决了三大核心难题:极端离散化下的语义保持、通过体渲染实现的像素艺术美学、端到端的离散优化。实验表明,本方法在多样化角色模型和可控抽象度(2-8种颜色,20倍-50倍分辨率)条件下均表现出卓越性能(CLIP-IQA评分37.12,用户偏好率77.90%)。项目页面:https://yichuanh.github.io/Voxify-3D/
English
Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting requirements of geometric abstraction, semantic preservation, and discrete color coherence. Existing methods either over-simplify geometry or fail to achieve the pixel-precise, palette-constrained aesthetics of voxel art. We introduce Voxify3D, a differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Our core innovation lies in the synergistic integration of three components: (1) orthographic pixel art supervision that eliminates perspective distortion for precise voxel-pixel alignment; (2) patch-based CLIP alignment that preserves semantics across discretization levels; (3) palette-constrained Gumbel-Softmax quantization enabling differentiable optimization over discrete color spaces with controllable palette strategies. This integration addresses fundamental challenges: semantic preservation under extreme discretization, pixel-art aesthetics through volumetric rendering, and end-to-end discrete optimization. Experiments show superior performance (37.12 CLIP-IQA, 77.90\% user preference) across diverse characters and controllable abstraction (2-8 colors, 20x-50x resolutions). Project page: https://yichuanh.github.io/Voxify-3D/
PDF302December 10, 2025