확산 모델을 위한 효율적인 텍스트 주도 컨볼루션 어댑터

초록

구조 보존 조건부 생성(SPCG)을 위한 확산 기반 프레임워크에 텍스트 주도적 효율 어댑터인 Nexus Adapters를 소개한다. 최근 구조 보존 방법들은 프롬프트 조건화를 위한 기본 모델과 스케치 또는 깊이 맵과 같은 구조 입력을 위한 어댑터를 사용하여 조건부 이미지 생성에서 유망한 결과를 달성했다. 이러한 기법들은 매우 비효율적이며, 때로는 기본 아키텍처에 버금가는 매개변수를 어댑터에 요구한다. 확산 모델 자체가 비용이 많이 들고 매개변수를 두 배로 늘리는 것은 매우 비효율적이기 때문에 모델 학습이 항상 가능한 것은 아니다. 이러한 접근법에서 어댑터는 입력 프롬프트를 인식하지 못하므로 구조 입력에만 최적화되어 입력 프롬프트에는 최적화되지 않는다. 위와 같은 문제를 해결하기 위해 프롬프트와 구조 입력의 지도를 받는 두 가지 효율적인 어댑터인 Nexus Prime와 Slim을 제안한다. 각 Nexus Block은 풍부한 다중 모드 조건화를 가능하게 하는 교차 주의 메커니즘을 통합한다. 따라서 제안된 어댑터는 구조를 보존하면서 입력 프롬프트를 더 잘 이해한다. 제안된 모델에 대한 광범위한 실험을 수행한 결과, Nexus Prime 어댑터가 기준 모델인 T2I-Adapter 대비 8M개의 추가 매개변수만 필요로 하면서도 성능을 크게 향상시킴을 확인했다. 또한 T2I-Adapter보다 18M개 더 적은 매개변수를 가진 경량 Nexus Slim 어댑터도 소개하며, 이는 여전히 최첨단 결과를 달성했다. 코드: https://github.com/arya-domain/Nexus-Adapters

English

We introduce the Nexus Adapters, novel text-guided efficient adapters to the diffusion-based framework for the Structure Preserving Conditional Generation (SPCG). Recently, structure-preserving methods have achieved promising results in conditional image generation by using a base model for prompt conditioning and an adapter for structure input, such as sketches or depth maps. These approaches are highly inefficient and sometimes require equal parameters in the adapter compared to the base architecture. It is not always possible to train the model since the diffusion model is itself costly, and doubling the parameter is highly inefficient. In these approaches, the adapter is not aware of the input prompt; therefore, it is optimal only for the structural input but not for the input prompt. To overcome the above challenges, we proposed two efficient adapters, Nexus Prime and Slim, which are guided by prompts and structural inputs. Each Nexus Block incorporates cross-attention mechanisms to enable rich multimodal conditioning. Therefore, the proposed adapter has a better understanding of the input prompt while preserving the structure. We conducted extensive experiments on the proposed models and demonstrated that the Nexus Prime adapter significantly enhances performance, requiring only 8M additional parameters compared to the baseline, T2I-Adapter. Furthermore, we also introduced a lightweight Nexus Slim adapter with 18M fewer parameters than the T2I-Adapter, which still achieved state-of-the-art results. Code: https://github.com/arya-domain/Nexus-Adapters

확산 모델을 위한 효율적인 텍스트 주도 컨볼루션 어댑터

Efficient Text-Guided Convolutional Adapter for the Diffusion Model

초록

Support