ChatPaper.aiChatPaper

AnyMo: 幾何感知且設置無關的野外場景人體動作建模

AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

May 21, 2026
作者: Baiyu Chen, Zechen Li, Wilson Wongso, Lihuan Li, Xiachong Lin, Hao Xue, Benjamin Tag, Flora Salim
cs.AI

摘要

随着可穿戴和移动设备日益融入日常生活,它们提供了一种在自然场景下持续感知人体运动的实用方式。然而,惯性信号高度依赖于传感设置,包括身体位置、安装方位、传感器朝向、设备硬件及采样协议。这种对传感设置的依赖性使得学习能够跨设备与数据集迁移的运动表征变得困难,并限制了可穿戴惯性测量单元在封闭集识别之外更广泛的应用。我们提出AnyMo——一种面向设置无关的人体运动建模的几何感知框架。AnyMo利用基于物理的惯性测量单元仿真,在密集的体表布设位置上生成多样化且合理的合成信号;通过配对合成布设视图与掩码部分观测预训练图编码器;将多位置惯性测量单元信号令牌化为全身运动令牌;并将这些令牌与大语言模型对齐以实现运动-语言理解。我们在三项互补任务上评估AnyMo:跨14个未见下游数据集的零样本活动识别、跨模态检索,以及可穿戴惯性测量单元运动描述生成。在人类活动识别任务上,平均准确率/F1分数/R@2分别提升11.7%/11.6%/22.6%;零样本惯性测量单元到文本及文本到惯性测量单元检索的平均倒数排名分别提升15.9%和28.6%;零样本描述生成的BERT-F1提升18.8%。这些结果支持AnyMo作为野外可穿戴运动理解的通用模型。项目页面:https://baiyuchen.com/project/AnyMo。
English
As wearable and mobile devices become increasingly embedded in daily life, they offer a practical way to continuously sense human motion in the wild. But inertial signals are highly dependent on the sensing setup, including body location, mounting position, sensor orientation, device hardware, and sampling protocol. This setup dependence makes it difficult to learn motion representations that transfer across devices and datasets, and limits the broader use of wearable IMUs beyond closed-set recognition. We introduce AnyMo, a geometry-aware framework for setup-agnostic human motion modeling. AnyMo uses physics-grounded IMU simulation over dense body-surface placements to generate diverse and plausible synthetic signals, pre-trains a graph encoder from paired synthetic placement views and masked partial observations, tokenizes multi-position IMU into full-body motion tokens, and aligns these tokens with an LLM for motion-language understanding. We evaluate AnyMo on three complementary tasks: zero-shot activity recognition across 14 unseen downstream datasets, cross-modal retrieval, and wearable IMU motion captioning, where it improves average Accuracy/F1/R@2 by 11.7\%/11.6\%/22.6\% on HAR, increases zero-shot IMU-to-text and text-to-IMU retrieval MRR by 15.9\% and 28.6\%, respectively, and improves zero-shot captioning BERT-F1 by 18.8\%. These results support AnyMo as a generalist model for wearable motion understanding in the wild. Project page: https://baiyuchen.com/project/AnyMo.