多视角一致的3D高斯头部头像‘无需’多视角生成
Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation
May 24, 2026
作者: Aviral Chharia, Fernando De la Torre
cs.AI
摘要
高保真3D高斯头部头像生成对于增强现实/虚拟现实、远程临场感和数字人应用至关重要。现有方法依赖于多视图数据集、3D捕获或中间2D视图合成。相比之下,我们仅从随机采样的2D图像中学习条件性和非条件性3D头部模型,无需多视图数据、3D监督或中间视图生成。我们提出MVCHead,一种单次状态空间模型,直接在3D表示中强制执行多视图一致性,同时在此约束下回归3D高斯。其核心是层次状态空间模块,该模块从粗到细逐步优化高斯体,同时捕获长程依赖。在每个HiSS模块内,我们修改Mamba的标准单向扫描,引入层次双向状态扫描,该扫描将循环对齐到多视图不一致性最强的轴上。最后,我们设计了SE(3)多视图评判器,评判一组自渲染图像是否源于单一底层3D配置,在不观察真实多视图对的情况下奖励跨视图像素对齐。MVCHead在感知质量上达到最先进水平,在纹理和几何一致性上超越先前方法,并保持可比的形状一致性。为展示可扩展性,我们发布了FaceGS-10K,这是首个用于3D头部模型训练和评估的即用型3D高斯头部资产大规模数据集。项目页面和代码:https://humansensinglab.github.io/MVCHead/
English
High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/