保真度感知的數據組合以實現穩健的機器人泛化

摘要

基於大規模、視覺同質化數據集訓練的通用機器人策略，易受捷徑學習影響，從而削弱其分佈外（OOD）泛化能力。雖然生成式數據增強是引入多樣性的常見方法，但它帶來了一個微妙的挑戰：數據組合。簡單地混合真實與合成數據可能會破壞學習信號，因為這一過程往往優先考慮視覺多樣性而犧牲信息保真度。本文提出，穩健的泛化依賴於基於原則、保真度感知的數據組合。我們引入了信息保真度一致性調優（CIFT）框架，將數據組合視為一個優化問題。CIFT利用數據集特徵空間幾何作為信息保真度的實用代理，從而識別出訓練穩定性下降的相變點，即“退相干點”。該框架包含一個生成引擎——多視角視頻增強（MVAug），用於合成因果解耦的數據譜以支持這一調優過程。將CIFT應用於如pi_0和擴散策略等策略架構，使OOD成功率提升超過54%。這些結果表明，超越單純數據合成，保真度感知的組合是開發穩健、通用機器人的重要組成部分。

English

Generalist robot policies trained on large-scale, visually homogeneous datasets can be susceptible to shortcut learning, which impairs their out-of-distribution (OOD) generalization. While generative data augmentation is a common approach to introduce diversity, it presents a subtle challenge: data composition. Naively mixing real and synthetic data can corrupt the learning signal, as this process often prioritizes visual diversity at the expense of information fidelity. This paper suggests that robust generalization depends on principled, fidelity-aware data composition. We introduce Coherent Information Fidelity Tuning (CIFT), a framework that treats data composition as an optimization problem. CIFT uses a practical proxy for Information Fidelity based on the feature-space geometry of a dataset. This enables the identification of a phase transition, termed the Decoherence Point, where training stability degrades. The framework includes a generative engine, Multi-View Video Augmentation (MVAug), to synthesize a causally disentangled data spectrum for this tuning process. Applying CIFT to policy architectures such as pi_0 and Diffusion Policy improves OOD success rates by over 54\%. These results indicate that fidelity-aware composition, beyond data synthesis alone, is an important component for developing robust, general-purpose robots.

保真度感知的數據組合以實現穩健的機器人泛化

Fidelity-Aware Data Composition for Robust Robot Generalization

摘要

Support