PoseDreamer: 확산 모델 기반의 확장 가능하고 사실적인 인간 데이터 생성 파이프라인

초록

단안 영상에서의 깊이 모호성과 3D 기하학 정보 주석 작업의固有한 어려움으로 인해 3D 인간 메쉬 추정을 위한 레이블된 데이터셋을 구축하는 것은 어려운 과제입니다. 기존 데이터셋은 수동으로 주석이 달린 3D 기하학 정보를 갖춘 실사 데이터셋으로 규모가 제한적이거나, 정확한 레이블을 제공하지만 사실성과 다양성이 부족하고 생산 비용이 높은 3D 엔진 기반의 합성 데이터셋에 그쳤습니다. 본 연구에서는 생성 데이터라는 세 번째 방식을 탐구합니다. 우리는 확산 모델을 활용하여 3D 메쉬 주석과 함께 대규모 합성 데이터셋을 생성하는 새로운 파이프라인인 PoseDreamer를 소개합니다. 우리의 접근 방식은 제어 가능한 이미지 생성과 제어 정렬을 위한 직접 선호 최적화(DPO), 커리큘럼 기반 하드 샘플 마이닝, 다단계 품질 필터링을 결합합니다. 이러한 구성 요소들은 함께 3D 레이블과 생성된 이미지 간의 대응 관계를 자연스럽게 유지하면서 데이터셋의 효용을 극대화하기 위해 어려운 샘플을 우선적으로 생성합니다. PoseDreamer를 사용하여 50만 개 이상의 고품질 합성 샘플을 생성했으며, 렌더링 기반 데이터셋 대비 이미지 품질 지표에서 76% 향상을 달성했습니다. PoseDreamer로 훈련된 모델은 실사 및 기존 합성 데이터셋으로 훈련된 모델과 견줄 만하거나 더 나은 성능을 보입니다. 또한 PoseDreamer와 합성 데이터셋을 결합하면 실사와 합성 데이터셋을 결합하는 것보다 더 나은 성능을 달성하여 우리 데이터셋의 상호 보완적 특성을 입증합니다. 전체 데이터셋과 생성 코드를 공개할 예정입니다.

English

Acquiring labeled datasets for 3D human mesh estimation is challenging due to depth ambiguities and the inherent difficulty of annotating 3D geometry from monocular images. Existing datasets are either real, with manually annotated 3D geometry and limited scale, or synthetic, rendered from 3D engines that provide precise labels but suffer from limited photorealism, low diversity, and high production costs. In this work, we explore a third path: generated data. We introduce PoseDreamer, a novel pipeline that leverages diffusion models to generate large-scale synthetic datasets with 3D mesh annotations. Our approach combines controllable image generation with Direct Preference Optimization for control alignment, curriculum-based hard sample mining, and multi-stage quality filtering. Together, these components naturally maintain correspondence between 3D labels and generated images, while prioritizing challenging samples to maximize dataset utility. Using PoseDreamer, we generate more than 500,000 high-quality synthetic samples, achieving a 76% improvement in image-quality metrics compared to rendering-based datasets. Models trained on PoseDreamer achieve performance comparable to or superior to those trained on real-world and traditional synthetic datasets. In addition, combining PoseDreamer with synthetic datasets results in better performance than combining real-world and synthetic datasets, demonstrating the complementary nature of our dataset. We will release the full dataset and generation code.

PoseDreamer: 확산 모델 기반의 확장 가능하고 사실적인 인간 데이터 생성 파이프라인

PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

초록

Support