待辦事項：標記下採樣以有效生成高分辨率圖像

摘要

注意機制對於影像擴散模型至關重要，然而，其二次計算複雜度限制了我們在合理時間和記憶體限制內能夠處理的影像尺寸。本文探討了在生成式影像模型中密集注意力的重要性，這些模型通常包含冗餘特徵，使它們適合於更稀疏的注意機制。我們提出了一種新穎的無需訓練的方法 ToDo，該方法依賴於關鍵和值標記的標記降採樣，以加速穩定擴散推論，對於常見尺寸可提升至2倍，對於高分辨率如2048x2048可提升至4.5倍或更多。我們展示了我們的方法在平衡高效通量和保真度方面優於先前的方法。

English

Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the importance of dense attention in generative image models, which often contain redundant features, making them suitable for sparser attention mechanisms. We propose a novel training-free method ToDo that relies on token downsampling of key and value tokens to accelerate Stable Diffusion inference by up to 2x for common sizes and up to 4.5x or more for high resolutions like 2048x2048. We demonstrate that our approach outperforms previous methods in balancing efficient throughput and fidelity.

待辦事項：標記下採樣以有效生成高分辨率圖像

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

摘要

Support