PoseDreamer：拡散モデルによるスケーラブルで写真のようにリアルなヒューマンデータ生成パイプライン

要旨

3次元人体メッシュ推定のためのラベル付きデータセットの取得は、深度の曖昧性や単眼画像からの3次元形状注釗の本質的な困難さから課題が多い。既存のデータセットは、手動で注釗された3次元形状を持つ現実世界データ（規模が限られる）か、3次元エンジンからレンダリングされた合成データ（正確なラベルを提供するが、写実性の低さ、多様性の不足、高い制作コストが課題）のいずれかである。本研究では第三の道として、生成データの可能性を探る。我々は拡散モデルを活用し、3次元メッシュ注釗付きの大規模合成データセットを生成する新規パイプライン「PoseDreamer」を提案する。本手法は、制御可能な画像生成と制御調整のためのDirect Preference Optimization、カリキュラムに基づく困難サンプルマイニング、多段階の品質フィルタリングを組み合わせる。これらの要素により、3次元ラベルと生成画像の対応関係を自然に維持しつつ、データセットの有用性を最大化するために困難なサンプルを優先的に扱う。PoseDreamerを用いて50万点以上の高品質合成サンプルを生成し、レンダリングベースのデータセットと比較して画像品質指標で76%の向上を達成した。PoseDreamerで訓練したモデルは、実世界データや従来の合成データで訓練したモデルと同等あるいは優れた性能を示した。さらに、PoseDreamerと合成データセットを組み合わせることで、実世界データと合成データセットを組み合わせた場合よりも優れた性能が得られ、本データセットの相補性が実証された。データセット全体と生成コードを公開予定である。

English

Acquiring labeled datasets for 3D human mesh estimation is challenging due to depth ambiguities and the inherent difficulty of annotating 3D geometry from monocular images. Existing datasets are either real, with manually annotated 3D geometry and limited scale, or synthetic, rendered from 3D engines that provide precise labels but suffer from limited photorealism, low diversity, and high production costs. In this work, we explore a third path: generated data. We introduce PoseDreamer, a novel pipeline that leverages diffusion models to generate large-scale synthetic datasets with 3D mesh annotations. Our approach combines controllable image generation with Direct Preference Optimization for control alignment, curriculum-based hard sample mining, and multi-stage quality filtering. Together, these components naturally maintain correspondence between 3D labels and generated images, while prioritizing challenging samples to maximize dataset utility. Using PoseDreamer, we generate more than 500,000 high-quality synthetic samples, achieving a 76% improvement in image-quality metrics compared to rendering-based datasets. Models trained on PoseDreamer achieve performance comparable to or superior to those trained on real-world and traditional synthetic datasets. In addition, combining PoseDreamer with synthetic datasets results in better performance than combining real-world and synthetic datasets, demonstrating the complementary nature of our dataset. We will release the full dataset and generation code.

PoseDreamer：拡散モデルによるスケーラブルで写真のようにリアルなヒューマンデータ生成パイプライン

PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

要旨

Support