HandX:扩展双手动作与交互生成技术
HandX: Scaling Bimanual Motion and Interaction Generation
March 30, 2026
作者: Zimu Zhang, Yucheng Zhang, Xiyan Xu, Ziyin Wang, Sirui Xu, Kai Zhou, Bing Zhou, Chuan Guo, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui
cs.AI
摘要
尽管人体动作合成技术发展迅猛,但真实的手部运动与双手交互研究仍显不足。现有全身模型往往忽略了驱动灵巧行为的细粒度特征——如手指关节活动、接触时序及双手协调等,而当前数据资源也缺乏能捕捉精细手指动态与协作的高保真双手序列。为填补这一空白,我们提出HandX框架,构建了涵盖数据、标注与评估的统一基础平台。我们通过整合筛选现有数据集提升质量,并新采集了专注于 underrepresented 双手交互的动作捕捉数据,其中包含细致的手指动态。针对可扩展标注需求,我们提出解耦策略:先提取代表性运动特征(如接触事件与手指屈伸),再利用大语言模型进行推理,生成与这些特征对齐的细粒度语义描述。基于所得数据与标注,我们采用多模态条件约束对扩散模型和自回归模型进行基准测试。实验证明,结合我们新提出的手部专项评估指标,系统能生成高质量的灵巧动作。我们进一步观察到明显的规模效应:使用更大规模、更高质量数据集训练的大型模型,能产生语义更连贯的双手运动。本数据集已公开以支持后续研究。
English
Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual sequences that capture nuanced finger dynamics and collaboration. To fill this gap, we present HandX, a unified foundation spanning data, annotation, and evaluation. We consolidate and filter existing datasets for quality, and collect a new motion-capture dataset targeting underrepresented bimanual interactions with detailed finger dynamics. For scalable annotation, we introduce a decoupled strategy that extracts representative motion features, e.g., contact events and finger flexion, and then leverages reasoning from large language models to produce fine-grained, semantically rich descriptions aligned with these features. Building on the resulting data and annotations, we benchmark diffusion and autoregressive models with versatile conditioning modes. Experiments demonstrate high-quality dexterous motion generation, supported by our newly proposed hand-focused metrics. We further observe clear scaling trends: larger models trained on larger, higher-quality datasets produce more semantically coherent bimanual motion. Our dataset is released to support future research.