MoRel:基于锚点中继双向融合与层级致密化的长距离无闪烁4D运动建模
MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification
December 10, 2025
作者: Sangwoon Kwak, Weeyoung Kwon, Jun Young Jeong, Geonho Kim, Won-Sik Cheong, Jihyong Oh
cs.AI
摘要
近期,4D高斯泼溅(4DGS)技术将3D高斯泼溅(3DGS)的高速渲染能力扩展至时间维度,实现了动态场景的实时渲染。然而,当前主要挑战在于如何对包含长程运动的动态视频进行建模——现有方法的简单扩展会导致内存急剧膨胀、时间闪烁现象,且无法处理随时间出现的遮挡或消失对象。为解决这些问题,我们提出了一种新型4DGS框架MoRel,其核心是采用基于锚点传递的双向混合(ARBB)机制。该方法通过在关键帧时间索引处逐步构建局部规范锚点空间,并在锚点层级建模帧间形变,从而增强时间一致性。通过学得关键帧锚点(KfA)间的双向形变,并借助可学习不透明度控制进行自适应混合,我们的方法有效缓解了时间不连续性与闪烁伪影。我们还提出了特征方差引导的分层致密化(FHD)方案,根据特征方差等级对KfA进行高效致密化,在保证渲染质量的同时控制内存增长。为系统评估模型处理真实世界长程4D运动的能力,我们新构建了包含长程4D运动的数据集SelfCap_{LR}。与现有动态视频数据集相比,该数据集在更广阔空间采集,具有更大的平均动态运动幅度。实验表明,MoRel在保持有限内存占用的同时,能够实现时间连贯、无闪烁的长程4D重建,展现了基于高斯动态表征的可扩展性与高效性。
English
Recent advances in 4D Gaussian Splatting (4DGS) have extended the high-speed rendering capability of 3D Gaussian Splatting (3DGS) into the temporal domain, enabling real-time rendering of dynamic scenes. However, one of the major remaining challenges lies in modeling long-range motion-contained dynamic videos, where a naive extension of existing methods leads to severe memory explosion, temporal flickering, and failure to handle appearing or disappearing occlusions over time. To address these challenges, we propose a novel 4DGS framework characterized by an Anchor Relay-based Bidirectional Blending (ARBB) mechanism, named MoRel, which enables temporally consistent and memory-efficient modeling of long-range dynamic scenes. Our method progressively constructs locally canonical anchor spaces at key-frame time index and models inter-frame deformations at the anchor level, enhancing temporal coherence. By learning bidirectional deformations between KfA and adaptively blending them through learnable opacity control, our approach mitigates temporal discontinuities and flickering artifacts. We further introduce a Feature-variance-guided Hierarchical Densification (FHD) scheme that effectively densifies KfA's while keeping rendering quality, based on an assigned level of feature-variance. To effectively evaluate our model's capability to handle real-world long-range 4D motion, we newly compose long-range 4D motion-contained dataset, called SelfCap_{LR}. It has larger average dynamic motion magnitude, captured at spatially wider spaces, compared to previous dynamic video datasets. Overall, our MoRel achieves temporally coherent and flicker-free long-range 4D reconstruction while maintaining bounded memory usage, demonstrating both scalability and efficiency in dynamic Gaussian-based representations.