画像内の被写体の再配置

要旨

現在の画像操作は主に、画像内の特定領域の置換や全体的なスタイルの変更といった静的な操作が中心となっている。本論文では、革新的な動的操作タスクである被写体再配置を提案する。このタスクは、ユーザーが指定した被写体を所望の位置に移動させながら、画像の忠実性を維持することを目的としている。本研究では、被写体再配置の基本的なサブタスク（再配置された被写体が残した空白部分の埋め込み、被写体の隠れた部分の再構築、周囲の領域と調和するように被写体をブレンドすること）が、統一されたプロンプトガイド付きインペインティングタスクとして効果的に再定式化できることを明らかにした。その結果、提案するタスク反転技術を通じて学習された様々なタスクプロンプトを用いて、これらのサブタスクを単一の拡散生成モデルで処理することが可能となった。さらに、被写体再配置の品質をさらに向上させるために、前処理および後処理技術を統合した。これらの要素を組み合わせることで、SEgment-gEnerate-and-bLEnd（SEELE）フレームワークを構築した。SEELEの被写体再配置における有効性を評価するため、ReSと呼ばれる実世界の被写体再配置データセットを構築した。ReSでの結果は、再配置された画像生成の品質を実証している。

English

Current image manipulation primarily centers on static manipulation, such as replacing specific regions within an image or altering its overall style. In this paper, we introduce an innovative dynamic manipulation task, subject repositioning. This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity. Our research reveals that the fundamental sub-tasks of subject repositioning, which include filling the void left by the repositioned subject, reconstructing obscured portions of the subject and blending the subject to be consistent with surrounding areas, can be effectively reformulated as a unified, prompt-guided inpainting task. Consequently, we can employ a single diffusion generative model to address these sub-tasks using various task prompts learned through our proposed task inversion technique. Additionally, we integrate pre-processing and post-processing techniques to further enhance the quality of subject repositioning. These elements together form our SEgment-gEnerate-and-bLEnd (SEELE) framework. To assess SEELE's effectiveness in subject repositioning, we assemble a real-world subject repositioning dataset called ReS. Our results on ReS demonstrate the quality of repositioned image generation.

画像内の被写体の再配置

Repositioning the Subject within Image

要旨

Support