FastHMR:通过扩散解码与令牌及层级融合加速人体网格重建
FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding
October 13, 2025
作者: Soroush Mehraban, Andrea Iaboni, Babak Taati
cs.AI
摘要
近期基於Transformer的三維人體網格恢復(HMR)模型雖取得了顯著成效,但由於其深層Transformer架構及冗餘token的存在,往往伴隨著高計算成本與複雜度。本文提出了兩種專為HMR設計的合併策略:誤差約束層合併(ECLM)與掩碼引導token合併(Mask-ToMe)。ECLM選擇性地合併對平均關節位置誤差(MPJPE)影響最小的Transformer層,而Mask-ToMe則專注於合併對最終預測貢獻甚微的背景token。為進一步應對合併可能導致的性能下降,我們提出了一種基於擴散的解碼器,該解碼器融合了時間上下文信息,並利用了從大規模動作捕捉數據集中學習到的姿態先驗。多項基準測試的實驗結果表明,我們的方法在實現最高2.3倍加速的同時,性能相較基線略有提升。
English
Recent transformer-based models for 3D Human Mesh Recovery (HMR) have
achieved strong performance but often suffer from high computational cost and
complexity due to deep transformer architectures and redundant tokens. In this
paper, we introduce two HMR-specific merging strategies: Error-Constrained
Layer Merging (ECLM) and Mask-guided Token Merging (Mask-ToMe). ECLM
selectively merges transformer layers that have minimal impact on the Mean Per
Joint Position Error (MPJPE), while Mask-ToMe focuses on merging background
tokens that contribute little to the final prediction. To further address the
potential performance drop caused by merging, we propose a diffusion-based
decoder that incorporates temporal context and leverages pose priors learned
from large-scale motion capture datasets. Experiments across multiple
benchmarks demonstrate that our method achieves up to 2.3x speed-up while
slightly improving performance over the baseline.