ORV: 4D 점유 중심 로봇 비디오 생성

초록

실제 로봇 시뮬레이션 데이터를 원격 조작을 통해 획득하는 작업은 시간과 노력이 많이 드는 것으로 악명이 높다. 최근, 행동 기반 생성 모델은 안전 문제를 제거하고 유지 보수 노력을 줄이는 장점으로 인해 로봇 학습 및 시뮬레이션 분야에서 널리 채택되고 있다. 그러나 이러한 방법에서 사용되는 행동 시퀀스는 전역적으로 거친 정렬로 인해 제어 정밀도가 제한되고 일반화 성능이 떨어지는 문제가 있다. 이러한 한계를 해결하기 위해, 본 논문에서는 4D 의미적 점유 시퀀스를 세밀한 표현으로 활용하여 비디오 생성에 더 정확한 의미적 및 기하학적 지침을 제공하는 점유 중심 로봇 비디오 생성 프레임워크인 ORV를 제안한다. 점유 기반 표현을 활용함으로써 ORV는 시뮬레이션 데이터를 사실적인 로봇 비디오로 원활하게 변환하면서도 높은 시간적 일관성과 정밀한 제어 가능성을 보장한다. 또한, 본 프레임워크는 로봇 그리핑 작업의 다중 시점 비디오를 동시에 생성할 수 있는 기능을 지원하며, 이는 하위 로봇 학습 작업에 중요한 역량이다. 다양한 데이터셋과 하위 작업에서 수행된 광범위한 실험 결과는 ORV가 기존 베이스라인 방법들을 일관되게 능가함을 보여준다. 데모, 코드 및 모델: https://orangesodahub.github.io/ORV

English

Acquiring real-world robotic simulation data through teleoperation is notoriously time-consuming and labor-intensive. Recently, action-driven generative models have gained widespread adoption in robot learning and simulation, as they eliminate safety concerns and reduce maintenance efforts. However, the action sequences used in these methods often result in limited control precision and poor generalization due to their globally coarse alignment. To address these limitations, we propose ORV, an Occupancy-centric Robot Video generation framework, which utilizes 4D semantic occupancy sequences as a fine-grained representation to provide more accurate semantic and geometric guidance for video generation. By leveraging occupancy-based representations, ORV enables seamless translation of simulation data into photorealistic robot videos, while ensuring high temporal consistency and precise controllability. Furthermore, our framework supports the simultaneous generation of multi-view videos of robot gripping operations - an important capability for downstream robotic learning tasks. Extensive experimental results demonstrate that ORV consistently outperforms existing baseline methods across various datasets and sub-tasks. Demo, Code and Model: https://orangesodahub.github.io/ORV

ORV: 4D 점유 중심 로봇 비디오 생성

ORV: 4D Occupancy-centric Robot Video Generation

초록

Support