Helix4D：复杂四维网格生成

摘要

当前视频到4D方法在处理复杂拓扑变化、透明材质、薄壁结构及内表面时面临挑战。我们提出Helix4D——一种动态网格生成框架，通过继承Trellis2的表达能力，将其从图像到3D的生成范式扩展至视频条件驱动的4D生成。我们的设计源于两个关键问题：(a) 如何在保留Trellis2对透明物体与内表面等罕见案例预训练质量的前提下，实现帧间信息的跨帧共享；(b) 如何在不破坏预训练能力的情况下，将时序信息注入纯3D位置编码。针对问题(a)，我们提出滑动窗口跨帧注意力机制，并以首帧为锚点。首帧由基础Trellis2模型生成后注入框架，通过跨帧注意力继承其在罕见案例中的生成质量。针对问题(b)，我们提出4D时序编码方案，将冗余的低频空间RoPE频带重新用于时序编码，以零参数开销将3D编码扩展至4D空间。大量实验表明，Helix4D在ActionBench及我们构建的高难度复杂动态数据集上，均可高效生成高质量动态网格。

English

Current video-to-4D methods struggle with complex topology changes, transparent materials, thin structures, and inner surfaces. We present Helix4D, a dynamic mesh generation framework by inheriting the expressive representation of Trellis2, adapting it from image-to-3D to video-conditioned 4D generation. Our design arises from two key questions: (a) how to enable Trellis2's frame-local attention to share information across frames while preserving its pretrained quality on rare cases such as transparent objects and inner surfaces, and (b) how to inject temporal information into a purely 3D positional encoding without breaking pretrained capabilities. We address (a) with a sliding-window cross-frame attention and anchor on the first frame. The first frame is generated by the base Trellis2 model and injected into our model, letting it inherit Trellis2's quality in rare cases through cross-frame attention. We address (b) with a 4D temporal encoding that repurposes redundant low-frequency spatial RoPE bands for time, extending the encoding from 3D with no additional parameters. Extensive experiments show the effectiveness of Helix4D for high-quality dynamic mesh generation on ActionBench and our own challenging complex dynamics set.