ChatPaper.aiChatPaper

触觉梦境:实现多功能人形机器人灵巧操作

Learning Versatile Humanoid Manipulation with Touch Dreaming

April 14, 2026
作者: Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Senthilkumaran, Hao Zhang, Bingqing Chen, Chen Qiu, H. Eric Tseng, Jonathan Francis, Ding Zhao
cs.AI

摘要

人形机器人有望实现通用型辅助功能,但在现实世界中的移动操作仍面临挑战,因其需要在频繁接触变化中保持全身稳定性、具备灵巧手部能力及接触感知能力。本研究聚焦于接触密集型的灵巧人形移动操作。我们首先开发了基于强化学习的全身控制器,确保复杂操作过程中下肢与躯干的稳定执行。在此基础上构建了全身数据采集系统,结合虚拟现实遥操作与人形运动映射技术,实现现实世界示范数据的高效采集。进而提出触觉梦境人形变换器(HTD)——一种多模态编码器-解码器变换器模型,将触觉作为核心模态与多视角视觉、本体感知共同建模。该策略通过行为克隆与触觉梦境增强进行单阶段训练:除预测动作片段外,策略还同步预测未来手部关节力与触觉潜在表征,促使共享变换器主干学习适用于灵巧交互的接触感知表征。在插入T型件、书籍整理、毛巾折叠、猫砂铲取、茶水服务五项接触密集型任务中,HTD相较基线模型实现90.9%的平均成功率相对提升。消融实验进一步表明,潜在空间触觉预测比原始触觉预测更具效能,带来30%的相对成功率增益。这些成果证明,结合稳健的全身执行系统、可扩展的人形数据采集及以触觉为核心的预测性学习,能够实现现实世界中多功能、高灵巧度的人形机器人操作。项目页面:humanoid-touch-dream.github.io。
English
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based whole-body controller that provides stable lower-body and torso execution during complex manipulation. Built on this controller, we develop a whole-body humanoid data collection system that combines VR-based teleoperation with human-to-humanoid motion mapping, enabling efficient collection of real-world demonstrations. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, encouraging the shared Transformer trunk to learn contact-aware representations for dexterous interaction. Across five contact-rich tasks, Insert-T, Book Organization, Towel Folding, Cat Litter Scooping, and Tea Serving, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that combining robust whole-body execution, scalable humanoid data collection, and predictive touch-centered learning enables versatile, high-dexterity humanoid manipulation in the real world. Project webpage: humanoid-touch-dream.github.io.
PDF21April 16, 2026