HQ-Edit: 指示ベース画像編集のための高品質データセット

要旨

本研究では、約20万件の編集を含む高品質な指示ベースの画像編集データセット「HQ-Edit」を紹介します。従来の属性ガイダンスや人間のフィードバックに依存したデータセット構築手法とは異なり、GPT-4VやDALL-E 3といった先進的な基盤モデルを活用したスケーラブルなデータ収集パイプラインを考案しました。高品質を保証するため、まずオンラインで多様な例を収集し、それを拡張した後、詳細なテキストプロンプト付きの入力画像と出力画像を特徴とする高品質なディプティクを作成し、ポストプロセスを通じて正確な整合を確保しました。さらに、GPT-4Vを用いて画像編集ペアの品質を定量的に評価するための2つの評価指標「Alignment」と「Coherence」を提案しました。HQ-Editの高解像度画像は詳細に富み、包括的な編集プロンプトを伴うため、既存の画像編集モデルの能力を大幅に向上させます。例えば、HQ-EditでファインチューンされたInstructPix2Pixは、人間が注釈を付けたデータでファインチューンされたモデルを凌ぐ、最先端の画像編集性能を達成できます。プロジェクトページはhttps://thefllood.github.io/HQEdit_webです。

English

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For example, an HQ-Edit finetuned InstructPix2Pix can attain state-of-the-art image editing performance, even surpassing those models fine-tuned with human-annotated data. The project page is https://thefllood.github.io/HQEdit_web.

HQ-Edit: 指示ベース画像編集のための高品質データセット

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

要旨

Support