多視角一致3D高斯頭像『無需』多視角生成
Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation
May 24, 2026
作者: Aviral Chharia, Fernando De la Torre
cs.AI
摘要
高保真3D高斯头部头像生成对于增强现实/虚拟现实(AR/VR)、远程临场和数字人应用至关重要。现有方法依赖多视角数据集、三维捕捉或中间二维视角合成。相比之下,我们仅从随机采样的二维图像中学习有条件和无条件的3D头部模型,无需多视角数据、三维监督或中间视角生成。我们提出MVCHead,一种单步状态空间模型,直接在三维表示中强制执行多视角一致性(MVC),并在这些约束下回归3D高斯。其核心在于我们提出的分层状态空间(HiSS)模块,该模块从粗到细逐步细化高斯,同时捕捉长程依赖关系。在每个HiSS模块内,我们将Mamba的标准单向扫描改进为所提出的分层双向状态扫描(HiBiSS),使递归方向与多视角不一致性最强的轴线对齐。最后,我们设计了一个SE(3)多视角评判器,用于判断一组自渲染图像是否源自同一个底层三维配置,从而在不观察真实多视角对的情况下奖励跨视角像素对齐。MVCHead实现了最先进的感知质量,在纹理和几何一致性上均超越先前方法,并保持了相当的形状一致性。为展示可扩展性,我们发布了FaceGS-10K,这是首个大规模可直接使用的3D高斯头部资产数据集,用于训练和评估3D头部模型。项目页面和代码:https://humansensinglab.github.io/MVCHead/
English
High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/