MoRel:基于锚点中继双向混合与分层致密化的长距离无闪烁4D运动建模
MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification
December 10, 2025
作者: Sangwoon Kwak, Weeyoung Kwon, Jun Young Jeong, Geonho Kim, Won-Sik Cheong, Jihyong Oh
cs.AI
摘要
近期,四维高斯泼溅(4DGS)技术的新进展将三维高斯泼溅(3DGS)的高速渲染能力扩展至时间维度,实现了动态场景的实时渲染。然而,当前主要挑战之一在于对包含长程运动的动态视频进行建模——现有方法的简单扩展会导致内存急剧膨胀、时间域闪烁现象加剧,且无法有效处理随时间出现的遮挡与消失问题。为解决这些难题,我们提出了一种名为MoRel的新型4DGS框架,其核心是采用基于锚点传递的双向混合(ARBB)机制。该框架能够以内存高效的方式实现长程动态场景的时间一致性建模。我们的方法通过在关键帧时间索引处逐步构建局部规范锚点空间,并在锚点层级建模帧间形变,从而增强时间连贯性。通过学得关键帧锚点(KfA)间的双向形变关系,并借助可学习的透明度控制进行自适应混合,我们的方法有效缓解了时间不连续性与闪烁伪影。我们还提出了特征方差引导的分层致密化(FHD)方案,根据特征方差等级对KfA进行高效致密化,在保持渲染质量的同时控制内存增长。为有效评估模型处理真实世界长程四维运动的能力,我们新构建了包含长程四维运动的数据集SelfCap_{LR}。与现有动态视频数据集相比,该数据集具有更大的平均动态运动幅度,且拍摄空间范围更广。总体而言,MoRel在保持有限内存占用的同时,实现了时间连贯且无闪烁的长程四维重建,展现了基于高斯表示的动态场景建模方法的可扩展性与高效性。
English
Recent advances in 4D Gaussian Splatting (4DGS) have extended the high-speed rendering capability of 3D Gaussian Splatting (3DGS) into the temporal domain, enabling real-time rendering of dynamic scenes. However, one of the major remaining challenges lies in modeling long-range motion-contained dynamic videos, where a naive extension of existing methods leads to severe memory explosion, temporal flickering, and failure to handle appearing or disappearing occlusions over time. To address these challenges, we propose a novel 4DGS framework characterized by an Anchor Relay-based Bidirectional Blending (ARBB) mechanism, named MoRel, which enables temporally consistent and memory-efficient modeling of long-range dynamic scenes. Our method progressively constructs locally canonical anchor spaces at key-frame time index and models inter-frame deformations at the anchor level, enhancing temporal coherence. By learning bidirectional deformations between KfA and adaptively blending them through learnable opacity control, our approach mitigates temporal discontinuities and flickering artifacts. We further introduce a Feature-variance-guided Hierarchical Densification (FHD) scheme that effectively densifies KfA's while keeping rendering quality, based on an assigned level of feature-variance. To effectively evaluate our model's capability to handle real-world long-range 4D motion, we newly compose long-range 4D motion-contained dataset, called SelfCap_{LR}. It has larger average dynamic motion magnitude, captured at spatially wider spaces, compared to previous dynamic video datasets. Overall, our MoRel achieves temporally coherent and flicker-free long-range 4D reconstruction while maintaining bounded memory usage, demonstrating both scalability and efficiency in dynamic Gaussian-based representations.