ChatPaper.aiChatPaper

探索扩散模型在机器人控制中的适用条件

Exploring Conditions for Diffusion models in Robotic Control

October 17, 2025
作者: Heeseong Shin, Byeongho Heo, Dongyoon Han, Seungryong Kim, Taekyung Kim
cs.AI

摘要

虽然预训练的视觉表征已显著推动了模仿学习的发展,但这些表征通常在策略学习过程中保持固定,因而与具体任务无关。本研究探索如何利用预训练的文本到图像扩散模型,在不微调模型本身的前提下,为机器人控制任务获取自适应视觉表征。然而我们发现,直接套用在其他视觉领域卓有成效的文本条件策略,在控制任务中收效甚微甚至会产生负面效果。我们将此归因于扩散模型训练数据与机器人控制环境之间的领域差异,进而主张应采用能兼顾控制任务所需动态视觉信息的条件机制。为此,我们提出ORCA框架,通过引入可学习的任务提示符来适应控制环境,并结合捕捉帧级细粒度细节的视觉提示符。通过这种新设计的条件机制实现任务自适应表征,我们的方法在多种机器人控制基准测试中达到了最先进的性能水平,显著超越了现有方法。
English
While pre-trained visual representations have significantly advanced imitation learning, they are often task-agnostic as they remain frozen during policy learning. In this work, we explore leveraging pre-trained text-to-image diffusion models to obtain task-adaptive visual representations for robotic control, without fine-tuning the model itself. However, we find that naively applying textual conditions - a successful strategy in other vision domains - yields minimal or even negative gains in control tasks. We attribute this to the domain gap between the diffusion model's training data and robotic control environments, leading us to argue for conditions that consider the specific, dynamic visual information required for control. To this end, we propose ORCA, which introduces learnable task prompts that adapt to the control environment and visual prompts that capture fine-grained, frame-specific details. Through facilitating task-adaptive representations with our newly devised conditions, our approach achieves state-of-the-art performance on various robotic control benchmarks, significantly surpassing prior methods.
PDF392December 2, 2025