ChatPaper.aiChatPaper

高效文本引導卷積適配器在擴散模型中的應用

Efficient Text-Guided Convolutional Adapter for the Diffusion Model

February 16, 2026
作者: Aryan Das, Koushik Biswas, Swalpa Kumar Roy, Badri Narayana Patro, Vinay Kumar Verma
cs.AI

摘要

我們推出 Nexus 適配器——一種基於擴散框架、專為結構保持條件生成(SPCG)設計的新型文本引導高效適配器。近期,結構保持方法在條件式影像生成領域取得顯著成果,其透過基礎模型處理提示條件,並使用適配器接收結構輸入(如素描或深度圖)。然而這類方法效率低下,有時適配器所需參數量甚至與基礎架構相當。由於擴散模型本身訓練成本高昂,倍增參數量極不經濟,且此類適配器未考慮輸入提示,導致其僅能優化結構輸入而無法對提示語作出最佳回應。為解決上述難題,我們提出兩種由提示語與結構輸入共同引導的高效適配器:Nexus Prime 與 Slim。每個 Nexus 區塊皆融入交叉注意力機制,實現豐富的多模態條件控制,使適配器在保持結構的同時更能理解輸入提示。我們對所提模型進行大量實驗,結果顯示 Nexus Prime 適配器僅需增加 800 萬參數(相較於 T2I-Adapter 基準模型),即可顯著提升效能。此外,我們還推出輕量版 Nexus Slim 適配器,其參數量比 T2I-Adapter 少 1800 萬,仍能達到頂尖性能。程式碼:https://github.com/arya-domain/Nexus-Adapters
English
We introduce the Nexus Adapters, novel text-guided efficient adapters to the diffusion-based framework for the Structure Preserving Conditional Generation (SPCG). Recently, structure-preserving methods have achieved promising results in conditional image generation by using a base model for prompt conditioning and an adapter for structure input, such as sketches or depth maps. These approaches are highly inefficient and sometimes require equal parameters in the adapter compared to the base architecture. It is not always possible to train the model since the diffusion model is itself costly, and doubling the parameter is highly inefficient. In these approaches, the adapter is not aware of the input prompt; therefore, it is optimal only for the structural input but not for the input prompt. To overcome the above challenges, we proposed two efficient adapters, Nexus Prime and Slim, which are guided by prompts and structural inputs. Each Nexus Block incorporates cross-attention mechanisms to enable rich multimodal conditioning. Therefore, the proposed adapter has a better understanding of the input prompt while preserving the structure. We conducted extensive experiments on the proposed models and demonstrated that the Nexus Prime adapter significantly enhances performance, requiring only 8M additional parameters compared to the baseline, T2I-Adapter. Furthermore, we also introduced a lightweight Nexus Slim adapter with 18M fewer parameters than the T2I-Adapter, which still achieved state-of-the-art results. Code: https://github.com/arya-domain/Nexus-Adapters
PDF72March 28, 2026