PLA4D：用于文本到4D高斯飞溅的像素级对齐

摘要

随着文本条件扩散模型（DMs）在图像、视频和3D生成领域取得突破，研究重点已转向更具挑战性的文本到4D合成任务，这引入了时间维度以生成动态3D对象。在这一背景下，我们确定了得分蒸馏采样（SDS）这一广泛使用的技术，用于文本到3D合成，由于其具有两面性和纹理不真实问题，再加上高计算成本，成为限制文本到4D性能的重要障碍。在本文中，我们提出了用于文本到4D高斯飞溅（PLA4D）的像素级对齐方法，这是一种新颖方法，利用文本到视频帧作为显式像素对齐目标，以生成静态3D对象并为其注入运动。具体来说，我们引入了焦点对齐来校准渲染的摄像机姿势，以及GS-Mesh对比学习来从渲染图像对比中提炼几何先验信息。此外，我们利用变形网络开发了运动对齐，以驱动高斯变化，并实现了参考细化，以获得平滑的4D对象表面。这些技术使4D高斯飞溅能够在像素级别与生成的视频对齐几何、纹理和运动。与以往方法相比，PLA4D在更短的时间内产生了具有更好纹理细节的合成输出，并有效地缓解了两面性问题。PLA4D完全采用开源模型实现，为4D数字内容创作提供了一种易于访问、用户友好且具有前景的方向。我们的项目页面：https://github.com/MiaoQiaowei/PLA4D.github.io。

English

As text-conditioned diffusion models (DMs) achieve breakthroughs in image, video, and 3D generation, the research community's focus has shifted to the more challenging task of text-to-4D synthesis, which introduces a temporal dimension to generate dynamic 3D objects. In this context, we identify Score Distillation Sampling (SDS), a widely used technique for text-to-3D synthesis, as a significant hindrance to text-to-4D performance due to its Janus-faced and texture-unrealistic problems coupled with high computational costs. In this paper, we propose Pixel-Level Alignments for Text-to-4D Gaussian Splatting (PLA4D), a novel method that utilizes text-to-video frames as explicit pixel alignment targets to generate static 3D objects and inject motion into them. Specifically, we introduce Focal Alignment to calibrate camera poses for rendering and GS-Mesh Contrastive Learning to distill geometry priors from rendered image contrasts at the pixel level. Additionally, we develop Motion Alignment using a deformation network to drive changes in Gaussians and implement Reference Refinement for smooth 4D object surfaces. These techniques enable 4D Gaussian Splatting to align geometry, texture, and motion with generated videos at the pixel level. Compared to previous methods, PLA4D produces synthesized outputs with better texture details in less time and effectively mitigates the Janus-faced problem. PLA4D is fully implemented using open-source models, offering an accessible, user-friendly, and promising direction for 4D digital content creation. Our project page: https://github.com/MiaoQiaowei/PLA4D.github.io{https://github.com/MiaoQiaowei/PLA4D.github.io}.

PLA4D：用于文本到4D高斯飞溅的像素级对齐

PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting

摘要

Support