頻率引導的動作擴散:經由子頻率流形遍歷
Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal
May 27, 2026
作者: Junlin Wang
cs.AI
摘要
透過行為模仿學習視覺運動策略,通常涉及模仿人類操作員所收集的專家示範。然而,人類自然的示範中固有地包含高頻噪聲,例如間歇性的抖動、停頓及動作顫動。訓練策略直接模仿這些原始軌跡,無可避免地會使模型繼承這些次優行為。此種病態現象在以擴散為基礎的策略中尤為明顯,因為反覆的去噪步驟可能在犧牲有意義的細微細節下,無意間放大高頻偽影。為了解決這些限制,我們提出了一種新穎的基於頻率的演算法,能夠實現隱式頻譜操控與平滑動作生成。我們的方法,即頻率引導算子(FGO),通過逐步引導含噪樣本穿過具有漸擴頻譜帶的中間次頻率流形,來操控擴散策略的生成過程。經過來自五個基準的十五項機器人操作任務驗證,FGO 在提升動作平滑度與時間一致性方面表現優異,同時保留了成功執行任務所需的細節。專案網站:https://henrywjl.github.io/frequency-guidance-operator/
English
Learning visuomotor policies via behavior cloning typically involves mimicking expert demonstrations collected by human operators. However, natural human demonstrations inherently contain high-frequency noise, such as intermittent jerks, pauses, and action jitter. Training policies to directly imitate these raw trajectories inevitably causes the model to inherit these suboptimal behaviors. This pathology is particularly pronounced in diffusion-based policies, where iterative denoising steps can inadvertently amplify high-frequency artifacts at the expense of meaningful fine-grained details. To address these limitations, we present a novel frequency-based algorithm that enables implicit spectral maneuvering and smooth action generation. Our method, Frequency Guidance Operator (FGO), steers the generation process of diffusion polices by progressively driving the noisy samples through intermediate sub-frequency manifolds with expanding spectral bands. Validated on 15 robotic manipulation tasks from 5 benchmarks, FGO achieves superior performance in enhancing action smoothness and temporal consistency while preserving the details necessary for successful task execution. Project website: https://henrywjl.github.io/frequency-guidance-operator/