ChatPaper.aiChatPaper

Voxify3D:像素藝術與體積渲染的融合

Voxify3D: Pixel Art Meets Volumetric Rendering

December 8, 2025
作者: Yi-Chuan Huang, Jiewen Chan, Hao-Jen Chien, Yu-Lun Liu
cs.AI

摘要

體素藝術作為遊戲與數位媒體中廣泛應用的獨特風格化形式,其從3D網格的自動生成仍面臨挑戰,原因在於幾何抽象化、語意保留與離散色彩一致性這三項需求存在內在矛盾。現有方法要麼過度簡化幾何結構,要麼難以實現體素藝術所需的像素級精確、調色板受限的美學效果。我們提出Voxify3D——一個連接3D網格優化與2D像素藝術監督的可微分兩階段框架。核心創新在於三項組件的協同整合:(1) 正交像素藝術監督機制,消除透視畸變以實現體素-像素精確對齊;(2) 基於圖塊的CLIP對齊技術,在離散化過程中保持跨層級語意連貫性;(3) 調色板約束的Gumbel-Softmax量化器,通過可控調色板策略實現離散色彩空間的可微分優化。該整合方案從根本上解決了三大難題:極端離散化下的語意保留、通過體積渲染實現的像素藝術美學,以及端到端的離散優化。實驗結果顯示,該方法在多樣化角色模型和可控抽象程度(2-8色、20倍-50倍解析度)下均表現卓越(CLIP-IQA得分37.12,用戶偏好率77.90%)。項目頁面:https://yichuanh.github.io/Voxify-3D/
English
Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting requirements of geometric abstraction, semantic preservation, and discrete color coherence. Existing methods either over-simplify geometry or fail to achieve the pixel-precise, palette-constrained aesthetics of voxel art. We introduce Voxify3D, a differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Our core innovation lies in the synergistic integration of three components: (1) orthographic pixel art supervision that eliminates perspective distortion for precise voxel-pixel alignment; (2) patch-based CLIP alignment that preserves semantics across discretization levels; (3) palette-constrained Gumbel-Softmax quantization enabling differentiable optimization over discrete color spaces with controllable palette strategies. This integration addresses fundamental challenges: semantic preservation under extreme discretization, pixel-art aesthetics through volumetric rendering, and end-to-end discrete optimization. Experiments show superior performance (37.12 CLIP-IQA, 77.90\% user preference) across diverse characters and controllable abstraction (2-8 colors, 20x-50x resolutions). Project page: https://yichuanh.github.io/Voxify-3D/
PDF302December 10, 2025