SpaceBlender: 生成的な3Dシーンブレンディングを通じてコンテキスト豊かな共同作業スペースを作成する

要旨

最近、生成AIを使用して仮想現実（VR）アプリケーション向けの3D空間を作成することに関心が高まっています。しかし、現在のモデルは人工的な環境を生成するにとどまり、ユーザーの物理的な状況を取り入れた共同作業をサポートするのには不十分です。VRテレプレゼンスをサポートする環境を生成するために、我々はSpaceBlenderを導入します。これは、ユーザーの物理的な環境を統合された仮想空間にブレンドするために生成AI技術を活用する革新的なパイプラインです。このパイプラインは、ユーザー提供の2D画像を、深度推定、メッシュ整列、幾何学的事前条件と適応的なテキストプロンプトによって誘導された拡散ベースの空間補完からなる反復プロセスを通じて、コンテキスト豊かな3D環境に変換します。20人の参加者がペアで協力してVRアフィニティ図式作成タスクを行った予備的な被験者間研究では、SpaceBlenderを一般的な仮想環境や最先端のシーン生成フレームワークと比較し、協力に適した仮想空間を作成する能力を評価しました。参加者はSpaceBlenderによって提供される向上した馴染みやコンテキストを評価しましたが、タスクの焦点をそらす可能性のある生成環境の複雑さも指摘しました。参加者のフィードバックをもとに、パイプラインの改善方向を提案し、異なるシナリオ向けのブレンドされた空間の価値とデザインについて議論します。

English

There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative AI techniques to blend users' physical surroundings into unified virtual spaces. This pipeline transforms user-provided 2D images into context-rich 3D environments through an iterative process consisting of depth estimation, mesh alignment, and diffusion-based space completion guided by geometric priors and adaptive text prompts. In a preliminary within-subjects study, where 20 participants performed a collaborative VR affinity diagramming task in pairs, we compared SpaceBlender with a generic virtual environment and a state-of-the-art scene generation framework, evaluating its ability to create virtual spaces suitable for collaboration. Participants appreciated the enhanced familiarity and context provided by SpaceBlender but also noted complexities in the generative environments that could detract from task focus. Drawing on participant feedback, we propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.