ロバストなロボット汎化のための忠実度を考慮したデータ構成

要旨

大規模で視覚的に均質なデータセットで訓練された汎用ロボットポリシーは、ショートカット学習に陥りやすく、これが分布外（OOD）汎化能力を損なうことがある。生成的なデータ拡張は多様性を導入する一般的なアプローチであるが、データ構成という微妙な課題を提示する。実データと合成データを単純に混合すると、視覚的多様性を優先するあまり情報の忠実性が損なわれ、学習信号が劣化する可能性がある。本論文は、堅牢な汎化能力は、原則に基づいた忠実性を考慮したデータ構成に依存することを示唆する。我々は、データ構成を最適化問題として扱うCoherent Information Fidelity Tuning（CIFT）フレームワークを提案する。CIFTは、データセットの特徴空間幾何学に基づく情報忠実性の実用的な代理指標を使用する。これにより、訓練の安定性が低下する脱コヒーレンスポイントと呼ばれる相転移を特定することが可能となる。このチューニングプロセスのために、因果的に分離されたデータスペクトルを合成する生成エンジン、Multi-View Video Augmentation（MVAug）をフレームワークに含める。CIFTをpi_0やDiffusion Policyなどのポリシーアーキテクチャに適用すると、OOD成功率が54％以上向上する。これらの結果は、データ合成だけでなく、忠実性を考慮した構成が、堅牢な汎用ロボットの開発において重要な要素であることを示している。

English

Generalist robot policies trained on large-scale, visually homogeneous datasets can be susceptible to shortcut learning, which impairs their out-of-distribution (OOD) generalization. While generative data augmentation is a common approach to introduce diversity, it presents a subtle challenge: data composition. Naively mixing real and synthetic data can corrupt the learning signal, as this process often prioritizes visual diversity at the expense of information fidelity. This paper suggests that robust generalization depends on principled, fidelity-aware data composition. We introduce Coherent Information Fidelity Tuning (CIFT), a framework that treats data composition as an optimization problem. CIFT uses a practical proxy for Information Fidelity based on the feature-space geometry of a dataset. This enables the identification of a phase transition, termed the Decoherence Point, where training stability degrades. The framework includes a generative engine, Multi-View Video Augmentation (MVAug), to synthesize a causally disentangled data spectrum for this tuning process. Applying CIFT to policy architectures such as pi_0 and Diffusion Policy improves OOD success rates by over 54\%. These results indicate that fidelity-aware composition, beyond data synthesis alone, is an important component for developing robust, general-purpose robots.

ロバストなロボット汎化のための忠実度を考慮したデータ構成

Fidelity-Aware Data Composition for Robust Robot Generalization

要旨

Support