RestoreFormer++：非劣化キー・バリューペアからの実世界向けブラインド顔復元に向けて

要旨

ブラインド顔復元は、未知の劣化を伴う顔画像から高品質な顔画像を復元することを目的としています。現在のアルゴリズムは主に事前情報を導入して高品質なディテールを補完し、印象的な進歩を達成しています。しかし、これらのアルゴリズムの多くは、顔に含まれる豊富な文脈情報と事前情報との相互作用を無視しており、最適ではない性能に留まっています。さらに、合成シナリオと実世界シナリオの間のギャップにあまり注意を払わないため、実世界アプリケーションに対するロバスト性と汎化性が制限されています。本研究では、RestoreFormer++を提案します。一方では、完全空間的注意メカニズムを導入して文脈情報と事前情報との相互作用をモデル化し、他方では、拡張劣化モデルを探索してより現実的な劣化顔画像を生成し、合成から実世界へのギャップを軽減します。現在のアルゴリズムと比較して、RestoreFormer++にはいくつかの重要な利点があります。まず、従来のビジュアルトランスフォーマーのようなマルチヘッド自己注意メカニズムを使用する代わりに、マルチスケール特徴量に対するマルチヘッドクロス注意を導入して、劣化情報と高品質な事前情報との空間的相互作用を完全に探求します。これにより、RestoreFormer++はよりリアルで忠実度の高い顔画像を復元することができます。第二に、認識指向の辞書とは対照的に、復元指向の辞書を事前情報として学習し、より多様な高品質な顔のディテールを含み、復元目標により適合します。第三に、より現実的な劣化シナリオを含む拡張劣化モデルを導入し、トレーニングデータの合成を支援することで、RestoreFormer++モデルのロバスト性と汎化性を向上させます。大規模な実験により、RestoreFormer++が合成データセットと実世界データセットの両方で最先端のアルゴリズムを上回ることが示されています。

English

Blind face restoration aims at recovering high-quality face images from those with unknown degradations. Current algorithms mainly introduce priors to complement high-quality details and achieve impressive progress. However, most of these algorithms ignore abundant contextual information in the face and its interplay with the priors, leading to sub-optimal performance. Moreover, they pay less attention to the gap between the synthetic and real-world scenarios, limiting the robustness and generalization to real-world applications. In this work, we propose RestoreFormer++, which on the one hand introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, and on the other hand, explores an extending degrading model to help generate more realistic degraded face images to alleviate the synthetic-to-real-world gap. Compared with current algorithms, RestoreFormer++ has several crucial benefits. First, instead of using a multi-head self-attention mechanism like the traditional visual transformer, we introduce multi-head cross-attention over multi-scale features to fully explore spatial interactions between corrupted information and high-quality priors. In this way, it can facilitate RestoreFormer++ to restore face images with higher realness and fidelity. Second, in contrast to the recognition-oriented dictionary, we learn a reconstruction-oriented dictionary as priors, which contains more diverse high-quality facial details and better accords with the restoration target. Third, we introduce an extending degrading model that contains more realistic degraded scenarios for training data synthesizing, and thus helps to enhance the robustness and generalization of our RestoreFormer++ model. Extensive experiments show that RestoreFormer++ outperforms state-of-the-art algorithms on both synthetic and real-world datasets.

RestoreFormer++：非劣化キー・バリューペアからの実世界向けブラインド顔復元に向けて

RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs

要旨

Support