DreamClear: プライバシー保護データセットのキュレーションを活用した高容量の実世界画像修復

要旨

現実世界のシナリオにおける画像修復（IR）は、高容量モデルと包括的なデータセットの不足により、著しい課題を抱えています。これらの問題に対処するために、私たちは二つの戦略を提案します。まず、革新的なデータキュレーションパイプラインであるGenIRと、最先端の拡散トランスフォーマー（DiT）ベースの画像修復モデルであるDreamClearです。GenIRは、既存のデータセットが通常数千枚しか含まれず、より大規模なモデルに対する一般化能力が限られているという制約を克服する、二重プロンプト学習パイプラインです。GenIRは、画像テキストペアの構築、二重プロンプトに基づく微調整、データ生成とフィルタリングという3つの段階にプロセスを簡素化します。このアプローチにより、手間のかかるデータクローリングプロセスを回避し、著作権の遵守を確保し、IRデータセットの構築に費用対効果の高いプライバシー保護のソリューションを提供します。その結果、100万枚の高品質画像からなる大規模データセットが生成されます。次に、DreamClearは、DiTベースの画像修復モデルです。このモデルは、テキストから画像への拡散モデル（T2I）の生成事前知識と、多モーダル大規模言語モデル（MLLMs）の堅牢な知覚能力を活用して、写実的な修復を実現します。さまざまな現実世界の劣化に対するモデルの適応性を向上させるために、Mixture of Adaptive Modulator（MoAM）を導入します。これにより、トークンごとの劣化事前知識を使用して、さまざまな修復専門家を動的に統合し、モデルが対処できる劣化の範囲を拡大します。私たちの徹底的な実験は、DreamClearの優れたパフォーマンスを確認し、現実世界の画像修復に対する私たちの二重戦略の効果を裏付けています。コードと事前学習済みモデルは以下で入手可能です：https://github.com/shallowdream204/DreamClear.

English

Image restoration (IR) in real-world scenarios presents significant challenges due to the lack of high-capacity models and comprehensive datasets. To tackle these issues, we present a dual strategy: GenIR, an innovative data curation pipeline, and DreamClear, a cutting-edge Diffusion Transformer (DiT)-based image restoration model. GenIR, our pioneering contribution, is a dual-prompt learning pipeline that overcomes the limitations of existing datasets, which typically comprise only a few thousand images and thus offer limited generalizability for larger models. GenIR streamlines the process into three stages: image-text pair construction, dual-prompt based fine-tuning, and data generation & filtering. This approach circumvents the laborious data crawling process, ensuring copyright compliance and providing a cost-effective, privacy-safe solution for IR dataset construction. The result is a large-scale dataset of one million high-quality images. Our second contribution, DreamClear, is a DiT-based image restoration model. It utilizes the generative priors of text-to-image (T2I) diffusion models and the robust perceptual capabilities of multi-modal large language models (MLLMs) to achieve photorealistic restoration. To boost the model's adaptability to diverse real-world degradations, we introduce the Mixture of Adaptive Modulator (MoAM). It employs token-wise degradation priors to dynamically integrate various restoration experts, thereby expanding the range of degradations the model can address. Our exhaustive experiments confirm DreamClear's superior performance, underlining the efficacy of our dual strategy for real-world image restoration. Code and pre-trained models will be available at: https://github.com/shallowdream204/DreamClear.

DreamClear: プライバシー保護データセットのキュレーションを活用した高容量の実世界画像修復

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

要旨

Support