面向鲁棒机器人泛化的保真度感知数据组合

摘要

在大型视觉同质数据集上训练的通用机器人策略容易陷入捷径学习，这损害了它们在分布外（OOD）场景下的泛化能力。虽然生成式数据增强是引入多样性的常用方法，但它带来了一个微妙挑战：数据组合。简单地将真实数据与合成数据混合可能会破坏学习信号，因为这一过程往往优先考虑视觉多样性而牺牲了信息保真度。本文提出，稳健的泛化依赖于基于原则的、保真度感知的数据组合。我们引入了连贯信息保真度调优（CIFT）框架，该框架将数据组合视为一个优化问题。CIFT利用数据集特征空间几何结构作为信息保真度的实用代理，从而能够识别出训练稳定性下降的相变点，即“退相干点”。该框架包含一个生成引擎——多视角视频增强（MVAug），用于合成因果解耦的数据谱以支持这一调优过程。将CIFT应用于如pi_0和扩散策略等策略架构，可将OOD成功率提升超过54%。这些结果表明，超越单纯的数据合成，保真度感知的组合是开发稳健通用机器人的重要组成部分。

English

Generalist robot policies trained on large-scale, visually homogeneous datasets can be susceptible to shortcut learning, which impairs their out-of-distribution (OOD) generalization. While generative data augmentation is a common approach to introduce diversity, it presents a subtle challenge: data composition. Naively mixing real and synthetic data can corrupt the learning signal, as this process often prioritizes visual diversity at the expense of information fidelity. This paper suggests that robust generalization depends on principled, fidelity-aware data composition. We introduce Coherent Information Fidelity Tuning (CIFT), a framework that treats data composition as an optimization problem. CIFT uses a practical proxy for Information Fidelity based on the feature-space geometry of a dataset. This enables the identification of a phase transition, termed the Decoherence Point, where training stability degrades. The framework includes a generative engine, Multi-View Video Augmentation (MVAug), to synthesize a causally disentangled data spectrum for this tuning process. Applying CIFT to policy architectures such as pi_0 and Diffusion Policy improves OOD success rates by over 54\%. These results indicate that fidelity-aware composition, beyond data synthesis alone, is an important component for developing robust, general-purpose robots.

面向鲁棒机器人泛化的保真度感知数据组合

Fidelity-Aware Data Composition for Robust Robot Generalization

摘要

Support