RestoreFormer++：從未降解的鍵-值對實現真實世界的盲目人臉修復

摘要

盲目臉部修復旨在從具有未知降質的圖像中恢復高質量的臉部圖像。目前的算法主要引入先驗信息來補充高質量細節並取得顯著進展。然而，大多數這些算法忽略了臉部中豐富的上下文信息及其與先驗信息的相互作用，導致次優性能。此外，它們較少關注合成和現實場景之間的差距，限制了對現實應用的魯棒性和泛化能力。在本研究中，我們提出了RestoreFormer++，一方面引入全空間注意機制來建模上下文信息和與先驗信息的相互作用，另一方面探索擴展降質模型，以幫助生成更真實的降質臉部圖像，以緩解合成到現實世界的差距。與當前算法相比，RestoreFormer++ 具有幾個關鍵優勢。首先，我們引入了多頭交叉注意力機制，而不是像傳統的視覺Transformer 那樣使用多頭自注意力機制，以完全探索受損信息與高質量先驗信息之間的空間交互作用。通過這種方式，它可以促進 RestoreFormer++ 恢復具有更高真實性和忠實度的臉部圖像。其次，與以識別為導向的字典相反，我們學習了以重建為導向的字典作為先驗信息，其中包含更多多樣的高質量臉部細節，更符合修復目標。第三，我們引入了一個擴展降質模型，其中包含更多真實的降質情景用於訓練數據合成，從而有助於增強我們的 RestoreFormer++ 模型的魯棒性和泛化能力。大量實驗表明，RestoreFormer++ 在合成和現實世界數據集上均優於最先進的算法。

English

Blind face restoration aims at recovering high-quality face images from those with unknown degradations. Current algorithms mainly introduce priors to complement high-quality details and achieve impressive progress. However, most of these algorithms ignore abundant contextual information in the face and its interplay with the priors, leading to sub-optimal performance. Moreover, they pay less attention to the gap between the synthetic and real-world scenarios, limiting the robustness and generalization to real-world applications. In this work, we propose RestoreFormer++, which on the one hand introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, and on the other hand, explores an extending degrading model to help generate more realistic degraded face images to alleviate the synthetic-to-real-world gap. Compared with current algorithms, RestoreFormer++ has several crucial benefits. First, instead of using a multi-head self-attention mechanism like the traditional visual transformer, we introduce multi-head cross-attention over multi-scale features to fully explore spatial interactions between corrupted information and high-quality priors. In this way, it can facilitate RestoreFormer++ to restore face images with higher realness and fidelity. Second, in contrast to the recognition-oriented dictionary, we learn a reconstruction-oriented dictionary as priors, which contains more diverse high-quality facial details and better accords with the restoration target. Third, we introduce an extending degrading model that contains more realistic degraded scenarios for training data synthesizing, and thus helps to enhance the robustness and generalization of our RestoreFormer++ model. Extensive experiments show that RestoreFormer++ outperforms state-of-the-art algorithms on both synthetic and real-world datasets.

RestoreFormer++：從未降解的鍵-值對實現真實世界的盲目人臉修復

RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs

摘要

Support