多視角一致3D高斯頭像『無需』多視角生成

摘要

高保真3D高斯头部头像生成对于增强现实/虚拟现实（AR/VR）、远程临场和数字人应用至关重要。现有方法依赖多视角数据集、三维捕捉或中间二维视角合成。相比之下，我们仅从随机采样的二维图像中学习有条件和无条件的3D头部模型，无需多视角数据、三维监督或中间视角生成。我们提出MVCHead，一种单步状态空间模型，直接在三维表示中强制执行多视角一致性（MVC），并在这些约束下回归3D高斯。其核心在于我们提出的分层状态空间（HiSS）模块，该模块从粗到细逐步细化高斯，同时捕捉长程依赖关系。在每个HiSS模块内，我们将Mamba的标准单向扫描改进为所提出的分层双向状态扫描（HiBiSS），使递归方向与多视角不一致性最强的轴线对齐。最后，我们设计了一个SE(3)多视角评判器，用于判断一组自渲染图像是否源自同一个底层三维配置，从而在不观察真实多视角对的情况下奖励跨视角像素对齐。MVCHead实现了最先进的感知质量，在纹理和几何一致性上均超越先前方法，并保持了相当的形状一致性。为展示可扩展性，我们发布了FaceGS-10K，这是首个大规模可直接使用的3D高斯头部资产数据集，用于训练和评估3D头部模型。项目页面和代码：https://humansensinglab.github.io/MVCHead/

English

High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/