ORV: 4次元占有空間中心のロボットビデオ生成

要旨

テレオペレーションによる現実世界のロボットシミュレーションデータの取得は、非常に時間と労力を要することが知られています。最近では、安全性の懸念を排除し、メンテナンスの手間を軽減するため、アクション駆動型の生成モデルがロボット学習とシミュレーションにおいて広く採用されています。しかし、これらの手法で使用されるアクションシーケンスは、全体的に粗いアラインメントのため、制御精度が限られ、汎化性能が低いという問題があります。これらの制限に対処するため、我々はORV（Occupancy-centric Robot Video generation framework）を提案します。ORVは、4Dセマンティックオキュパンシーシーケンスを細粒度の表現として利用し、ビデオ生成により正確なセマンティックおよび幾何学的ガイダンスを提供します。オキュパンシーベースの表現を活用することで、ORVはシミュレーションデータをフォトリアルなロボットビデオにシームレスに変換し、高い時間的一貫性と正確な制御性を確保します。さらに、我々のフレームワークは、ロボットの把持操作のマルチビュービデオを同時に生成することをサポートします。これは、下流のロボット学習タスクにとって重要な能力です。広範な実験結果は、ORVが様々なデータセットとサブタスクにおいて、既存のベースラインメソッドを一貫して上回ることを示しています。デモ、コード、モデルはこちら：https://orangesodahub.github.io/ORV

English

Acquiring real-world robotic simulation data through teleoperation is notoriously time-consuming and labor-intensive. Recently, action-driven generative models have gained widespread adoption in robot learning and simulation, as they eliminate safety concerns and reduce maintenance efforts. However, the action sequences used in these methods often result in limited control precision and poor generalization due to their globally coarse alignment. To address these limitations, we propose ORV, an Occupancy-centric Robot Video generation framework, which utilizes 4D semantic occupancy sequences as a fine-grained representation to provide more accurate semantic and geometric guidance for video generation. By leveraging occupancy-based representations, ORV enables seamless translation of simulation data into photorealistic robot videos, while ensuring high temporal consistency and precise controllability. Furthermore, our framework supports the simultaneous generation of multi-view videos of robot gripping operations - an important capability for downstream robotic learning tasks. Extensive experimental results demonstrate that ORV consistently outperforms existing baseline methods across various datasets and sub-tasks. Demo, Code and Model: https://orangesodahub.github.io/ORV

ORV: 4次元占有空間中心のロボットビデオ生成

ORV: 4D Occupancy-centric Robot Video Generation

要旨

Support