CrowdMoGen: 제로샷 텍스트 기반 집단 모션 생성

초록

군중 모션 생성(Crowd Motion Generation)은 애니메이션 및 게임과 같은 엔터테인먼트 산업뿐만 아니라 도시 시뮬레이션 및 계획과 같은 전략적 분야에서 필수적인 기술입니다. 이 새로운 과제는 특정 공간 및 의미론적 제약 하에서 군중 역학을 현실적으로 합성하기 위해 제어와 생성을 복잡하게 통합해야 하며, 그 도전 과제들은 아직 완전히 탐구되지 않았습니다. 한편, 기존의 인간 모션 생성 모델은 주로 개별 행동에 초점을 맞추어 집단 행동의 복잡성을 간과하는 경향이 있습니다. 다른 한편, 최근의 다중 인물 모션 생성 방법들은 사전 정의된 시나리오에 크게 의존하며 고정된 소수의 상호작용으로 제한되어 실용성을 저해하고 있습니다. 이러한 문제를 극복하기 위해, 우리는 CrowdMoGen을 소개합니다. 이는 대규모 언어 모델(LLM)의 힘을 활용하여 집단 지능을 모션 생성 프레임워크에 가이드로 통합함으로써, 짝을 이루는 훈련 데이터 없이도 군중 모션의 일반화 가능한 계획과 생성을 가능하게 하는 제로샷 텍스트 기반 프레임워크입니다. 우리의 프레임워크는 두 가지 주요 구성 요소로 이루어져 있습니다: 1) 특정 장면 컨텍스트나 도입된 교란에 따라 모션과 역학을 조율하는 Crowd Scene Planner, 그리고 2) 전체적인 계획을 기반으로 필요한 집단 모션을 효율적으로 합성하는 Collective Motion Generator. 광범위한 정량적 및 정성적 실험을 통해 우리 프레임워크의 효과성이 검증되었으며, 이는 군중 모션 생성 과제에 대한 확장 가능하고 일반화 가능한 솔루션을 제공함으로써 중요한 공백을 메우는 동시에 높은 수준의 현실감과 유연성을 달성합니다.

English

Crowd Motion Generation is essential in entertainment industries such as animation and games as well as in strategic fields like urban simulation and planning. This new task requires an intricate integration of control and generation to realistically synthesize crowd dynamics under specific spatial and semantic constraints, whose challenges are yet to be fully explored. On the one hand, existing human motion generation models typically focus on individual behaviors, neglecting the complexities of collective behaviors. On the other hand, recent methods for multi-person motion generation depend heavily on pre-defined scenarios and are limited to a fixed, small number of inter-person interactions, thus hampering their practicality. To overcome these challenges, we introduce CrowdMoGen, a zero-shot text-driven framework that harnesses the power of Large Language Model (LLM) to incorporate the collective intelligence into the motion generation framework as guidance, thereby enabling generalizable planning and generation of crowd motions without paired training data. Our framework consists of two key components: 1) Crowd Scene Planner that learns to coordinate motions and dynamics according to specific scene contexts or introduced perturbations, and 2) Collective Motion Generator that efficiently synthesizes the required collective motions based on the holistic plans. Extensive quantitative and qualitative experiments have validated the effectiveness of our framework, which not only fills a critical gap by providing scalable and generalizable solutions for Crowd Motion Generation task but also achieves high levels of realism and flexibility.

CrowdMoGen: 제로샷 텍스트 기반 집단 모션 생성

CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

초록

Support