ChatPaper.aiChatPaper

探索擴散模型在機器人控制中的應用條件

Exploring Conditions for Diffusion models in Robotic Control

October 17, 2025
作者: Heeseong Shin, Byeongho Heo, Dongyoon Han, Seungryong Kim, Taekyung Kim
cs.AI

摘要

儘管預訓練視覺表徵已顯著推動模仿學習的發展,這些表徵在策略學習過程中往往保持凍結狀態,因而缺乏任務針對性。本研究探索如何利用預訓練文生圖擴散模型,在不微調模型本身的前提下,為機器人控制任務獲取自適應視覺表徵。然而我們發現,直接套用文字條件——這一在其他視覺領域成功的策略——在控制任務中收效甚微甚至產生負面效果。我們將其歸因於擴散模型訓練數據與機器人控制環境之間的領域差異,據此提出應採用能考量控制任務所需特定動態視覺信息的條件機制。為此,我們提出ORCA框架,通過可學習的任務提示符適應控制環境,並利用視覺提示符捕捉細粒度幀級細節。透過新型條件機制實現任務自適應表徵,我們的方法在多個機器人控制基準測試中達到最先進性能,顯著超越現有方法。
English
While pre-trained visual representations have significantly advanced imitation learning, they are often task-agnostic as they remain frozen during policy learning. In this work, we explore leveraging pre-trained text-to-image diffusion models to obtain task-adaptive visual representations for robotic control, without fine-tuning the model itself. However, we find that naively applying textual conditions - a successful strategy in other vision domains - yields minimal or even negative gains in control tasks. We attribute this to the domain gap between the diffusion model's training data and robotic control environments, leading us to argue for conditions that consider the specific, dynamic visual information required for control. To this end, we propose ORCA, which introduces learnable task prompts that adapt to the control environment and visual prompts that capture fine-grained, frame-specific details. Through facilitating task-adaptive representations with our newly devised conditions, our approach achieves state-of-the-art performance on various robotic control benchmarks, significantly surpassing prior methods.
PDF392December 2, 2025