RestoreFormer++: 비저하된 키-값 쌍을 기반으로 한 실세계 블라인드 얼굴 복원을 향하여

초록

블라인드 얼굴 복원은 알려지지 않은 열화가 있는 얼굴 이미지에서 고품질의 얼굴 이미지를 복구하는 것을 목표로 합니다. 현재의 알고리즘은 주로 고품질의 세부 사항을 보완하기 위해 사전 정보를 도입하여 인상적인 진전을 이루었습니다. 그러나 이러한 알고리즘 대부분은 얼굴 내의 풍부한 문맥 정보와 사전 정보 간의 상호작용을 무시하여 최적의 성능을 달성하지 못하고 있습니다. 또한, 합성된 시나리오와 실제 시나리오 간의 격차에 대해 덜 주의를 기울여 실제 응용 프로그램에서의 견고성과 일반화를 제한하고 있습니다. 본 연구에서는 RestoreFormer++를 제안합니다. 이 모델은 한편으로는 문맥 정보와 사전 정보 간의 상호작용을 모델링하기 위해 완전 공간적 주의 메커니즘을 도입하고, 다른 한편으로는 더 현실적인 열화된 얼굴 이미지를 생성하여 합성과 실제 세계 간의 격차를 완화하기 위해 확장된 열화 모델을 탐구합니다. 현재의 알고리즘과 비교하여 RestoreFormer++는 몇 가지 중요한 이점을 가지고 있습니다. 첫째, 기존의 시각 트랜스포머와 같은 다중 헤드 자기 주의 메커니즘 대신, 다중 스케일 특징에 대한 다중 헤드 교차 주의를 도입하여 손상된 정보와 고품질 사전 정보 간의 공간적 상호작용을 완전히 탐구합니다. 이를 통해 RestoreFormer++는 더 높은 현실감과 충실도로 얼굴 이미지를 복원할 수 있습니다. 둘째, 인식 지향 사전과 달리, 복원 지향 사전을 사전 정보로 학습하여 더 다양한 고품질 얼굴 세부 사항을 포함하고 복원 목표에 더 잘 부합합니다. 셋째, 더 현실적인 열화 시나리오를 포함하는 확장된 열화 모델을 도입하여 훈련 데이터 합성을 돕고, 이를 통해 RestoreFormer++ 모델의 견고성과 일반화를 강화합니다. 광범위한 실험을 통해 RestoreFormer++가 합성 및 실제 데이터셋에서 최신 알고리즘을 능가함을 보여줍니다.

English

Blind face restoration aims at recovering high-quality face images from those with unknown degradations. Current algorithms mainly introduce priors to complement high-quality details and achieve impressive progress. However, most of these algorithms ignore abundant contextual information in the face and its interplay with the priors, leading to sub-optimal performance. Moreover, they pay less attention to the gap between the synthetic and real-world scenarios, limiting the robustness and generalization to real-world applications. In this work, we propose RestoreFormer++, which on the one hand introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, and on the other hand, explores an extending degrading model to help generate more realistic degraded face images to alleviate the synthetic-to-real-world gap. Compared with current algorithms, RestoreFormer++ has several crucial benefits. First, instead of using a multi-head self-attention mechanism like the traditional visual transformer, we introduce multi-head cross-attention over multi-scale features to fully explore spatial interactions between corrupted information and high-quality priors. In this way, it can facilitate RestoreFormer++ to restore face images with higher realness and fidelity. Second, in contrast to the recognition-oriented dictionary, we learn a reconstruction-oriented dictionary as priors, which contains more diverse high-quality facial details and better accords with the restoration target. Third, we introduce an extending degrading model that contains more realistic degraded scenarios for training data synthesizing, and thus helps to enhance the robustness and generalization of our RestoreFormer++ model. Extensive experiments show that RestoreFormer++ outperforms state-of-the-art algorithms on both synthetic and real-world datasets.

RestoreFormer++: 비저하된 키-값 쌍을 기반으로 한 실세계 블라인드 얼굴 복원을 향하여

RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs

초록

Support