FastHMR:通过扩散解码与令牌及层级融合加速人体网格重建
FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding
October 13, 2025
作者: Soroush Mehraban, Andrea Iaboni, Babak Taati
cs.AI
摘要
近期基于Transformer的三维人体网格恢复(HMR)模型虽取得了显著性能,但常因深层Transformer架构及冗余token导致高计算成本与复杂性。本文提出了两种专为HMR设计的合并策略:误差约束层合并(ECLM)与掩码引导Token合并(Mask-ToMe)。ECLM选择性合并对平均关节位置误差(MPJPE)影响最小的Transformer层,而Mask-ToMe则专注于合并对最终预测贡献甚微的背景token。为进一步应对合并可能带来的性能下降,我们引入了一种基于扩散的解码器,该解码器融合了时序上下文,并利用从大规模运动捕捉数据集中学习到的姿态先验。跨多个基准的实验表明,我们的方法在略微提升基线性能的同时,实现了最高2.3倍的加速。
English
Recent transformer-based models for 3D Human Mesh Recovery (HMR) have
achieved strong performance but often suffer from high computational cost and
complexity due to deep transformer architectures and redundant tokens. In this
paper, we introduce two HMR-specific merging strategies: Error-Constrained
Layer Merging (ECLM) and Mask-guided Token Merging (Mask-ToMe). ECLM
selectively merges transformer layers that have minimal impact on the Mean Per
Joint Position Error (MPJPE), while Mask-ToMe focuses on merging background
tokens that contribute little to the final prediction. To further address the
potential performance drop caused by merging, we propose a diffusion-based
decoder that incorporates temporal context and leverages pose priors learned
from large-scale motion capture datasets. Experiments across multiple
benchmarks demonstrate that our method achieves up to 2.3x speed-up while
slightly improving performance over the baseline.