TT4D：基于单目视频的乒乓球四维重建流程与数据集

摘要

我们推出TT4D——一个大规模高保真度的乒乓球数据集。该数据集通过单目广播视频重构了140余小时的单双打比赛，具备多模态标注信息，包括高质量相机标定、精确的三维球体位置、球体旋转、时间分段以及随时间变化的三维人体网格。这些丰富数据为虚拟回放、深度球员分析和机器人学习提供了全新基础。数据集通过创新重构流程实现了规模与精度的结合：现有方法通常先基于二维球轨将比赛序列分割为独立击球片段再进行重构，但基于二维的时间分割会在遮挡和多视角场景下失效。我们颠覆了这一范式，首先通过学习的提升网络将未分割的二维球轨整体升维至三维空间，再利用三维轨迹实现可靠的时间分割。该提升网络还能推断球体旋转、处理不可靠的球体检测，并在严重遮挡情况下成功重构球体轨迹。这种"先升维"的设计至关重要，我们的流程是当前唯一能从通用视角的单目广播视频重构乒乓球比赛的方法。我们通过两项下游任务验证了数据集的保真度：估算击球时球拍的姿态与速度，以及训练竞技回合的生成模型。

English

We present TT4D, a large-scale, high-fidelity table tennis dataset. It provides 140+ hours of reconstructed singles and doubles gameplay from monocular broadcast videos, featuring multimodal annotations like high-quality camera calibrations, precise 3D ball positions, ball spin, time segmentation, and 3D human meshes over time. This rich data provides a new foundation for virtual replay, in-depth player analysis, and robot learning. The dataset's combination of scale and precision is achieved through a novel reconstruction pipeline. Prior methods first partition a game sequence into individual shot segments based on the 2D ball track, and only then attempt reconstruction. However, 2D-based time segmentation collapses under occlusion and varied camera viewpoints, preventing reliable reconstruction. We invert this paradigm by first lifting the entire unsegmented 2D ball track to 3D through a learned lifting network. This 3D trajectory then allows us to reliably perform time segmentation. The learned lifting network also infers the ball's spin, handles unreliable ball detections, and successfully reconstructs the ball trajectory in cases of high occlusion. This lift-first design is necessary, as our pipeline is the only method capable of reconstructing table tennis gameplay from general-view broadcast monocular videos. We demonstrate the dataset's fidelity through two downstream tasks: estimating the racket's pose \& velocity at impact, and training a generative model of competitive rallies.

TT4D：基于单目视频的乒乓球四维重建流程与数据集

TT4D: A Pipeline and Dataset for Table Tennis 4D Reconstruction From Monocular Videos

摘要

Support