保真度感知的數據組合以實現穩健的機器人泛化
Fidelity-Aware Data Composition for Robust Robot Generalization
September 29, 2025
作者: Zizhao Tong, Di Chen, Sicheng Hu, Hongwei Fan, Liliang Chen, Guanghui Ren, Hao Tang, Hao Dong, Ling Shao
cs.AI
摘要
基於大規模、視覺同質化數據集訓練的通用機器人策略,易受捷徑學習影響,從而削弱其分佈外(OOD)泛化能力。雖然生成式數據增強是引入多樣性的常見方法,但它帶來了一個微妙的挑戰:數據組合。簡單地混合真實與合成數據可能會破壞學習信號,因為這一過程往往優先考慮視覺多樣性而犧牲信息保真度。本文提出,穩健的泛化依賴於基於原則、保真度感知的數據組合。我們引入了信息保真度一致性調優(CIFT)框架,將數據組合視為一個優化問題。CIFT利用數據集特徵空間幾何作為信息保真度的實用代理,從而識別出訓練穩定性下降的相變點,即“退相干點”。該框架包含一個生成引擎——多視角視頻增強(MVAug),用於合成因果解耦的數據譜以支持這一調優過程。將CIFT應用於如pi_0和擴散策略等策略架構,使OOD成功率提升超過54%。這些結果表明,超越單純數據合成,保真度感知的組合是開發穩健、通用機器人的重要組成部分。
English
Generalist robot policies trained on large-scale, visually homogeneous
datasets can be susceptible to shortcut learning, which impairs their
out-of-distribution (OOD) generalization. While generative data augmentation is
a common approach to introduce diversity, it presents a subtle challenge: data
composition. Naively mixing real and synthetic data can corrupt the learning
signal, as this process often prioritizes visual diversity at the expense of
information fidelity. This paper suggests that robust generalization depends on
principled, fidelity-aware data composition. We introduce Coherent Information
Fidelity Tuning (CIFT), a framework that treats data composition as an
optimization problem. CIFT uses a practical proxy for Information Fidelity
based on the feature-space geometry of a dataset. This enables the
identification of a phase transition, termed the Decoherence Point, where
training stability degrades. The framework includes a generative engine,
Multi-View Video Augmentation (MVAug), to synthesize a causally disentangled
data spectrum for this tuning process. Applying CIFT to policy architectures
such as pi_0 and Diffusion Policy improves OOD success rates by over 54\%.
These results indicate that fidelity-aware composition, beyond data synthesis
alone, is an important component for developing robust, general-purpose robots.