ChatPaper.aiChatPaper

DynamicVLA:面向动态物体操作的视觉-语言-动作模型

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

January 29, 2026
作者: Haozhe Xie, Beichen Wen, Jiarui Zheng, Zhaoxi Chen, Fangzhou Hong, Haiwen Diao, Ziwei Liu
cs.AI

摘要

操纵动态物体对视觉-语言-动作模型而言仍是开放挑战。尽管这类模型在静态操作中展现出强大泛化能力,但在需要快速感知、时序预测和持续控制的动态场景中仍存在困难。我们提出DynamicVLA——一种动态物体操作框架,通过三项核心设计整合时序推理与闭环自适应:1)采用卷积视觉编码器的紧凑型0.4B参数量VLA模型,实现空间效率高、结构保真的编码,支撑快速多模态推理;2)连续推理机制,通过重叠式推理与执行降低延迟,实时适应物体运动;3)潜在感知动作流传输,通过强制时序对齐的动作执行弥合感知与执行的间隙。为填补动态操作数据空白,我们构建了动态物体操作基准DOM,通过自动数据采集流程从零创建了涵盖2.8K个场景、206个物体的20万条合成交互轨迹,并无需遥操作即可快速采集2000条真实世界轨迹。大量实验表明,该方法在响应速度、感知能力和泛化性能上取得显著提升,使DynamicVLA成为跨具身系统的通用动态物体操作统一框架。
English
Manipulating dynamic objects remains an open challenge for Vision-Language-Action (VLA) models, which, despite strong generalization in static manipulation, struggle in dynamic scenarios requiring rapid perception, temporal anticipation, and continuous control. We present DynamicVLA, a framework for dynamic object manipulation that integrates temporal reasoning and closed-loop adaptation through three key designs: 1) a compact 0.4B VLA using a convolutional vision encoder for spatially efficient, structurally faithful encoding, enabling fast multimodal inference; 2) Continuous Inference, enabling overlapping reasoning and execution for lower latency and timely adaptation to object motion; and 3) Latent-aware Action Streaming, which bridges the perception-execution gap by enforcing temporally aligned action execution. To fill the missing foundation of dynamic manipulation data, we introduce the Dynamic Object Manipulation (DOM) benchmark, built from scratch with an auto data collection pipeline that efficiently gathers 200K synthetic episodes across 2.8K scenes and 206 objects, and enables fast collection of 2K real-world episodes without teleoperation. Extensive evaluations demonstrate remarkable improvements in response speed, perception, and generalization, positioning DynamicVLA as a unified framework for general dynamic object manipulation across embodiments.
PDF503January 31, 2026