ToDo: 고해상도 이미지의 효율적 생성을 위한 토큰 다운샘플링

초록

이미지 확산 모델에서 어텐션 메커니즘은 중요한 역할을 해왔지만, 이차원적인 계산 복잡성으로 인해 합리적인 시간과 메모리 제약 내에서 처리할 수 있는 이미지 크기가 제한되어 왔다. 본 논문은 생성적 이미지 모델에서 밀집 어텐션의 중요성을 조사하며, 이러한 모델들은 종종 중복된 특징을 포함하고 있어 희소 어텐션 메커니즘에 적합함을 보인다. 우리는 키와 값 토큰의 토큰 다운샘플링에 기반한 새로운 학습 불필요 방법인 ToDo를 제안하며, 이를 통해 일반적인 크기의 이미지에 대해 최대 2배, 2048x2048과 같은 고해상도 이미지에 대해 최대 4.5배 이상의 Stable Diffusion 추론 속도를 가속화한다. 우리의 접근 방식이 효율적인 처리량과 충실도 간의 균형을 이전 방법들보다 우수하게 달성함을 입증한다.

English

Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the importance of dense attention in generative image models, which often contain redundant features, making them suitable for sparser attention mechanisms. We propose a novel training-free method ToDo that relies on token downsampling of key and value tokens to accelerate Stable Diffusion inference by up to 2x for common sizes and up to 4.5x or more for high resolutions like 2048x2048. We demonstrate that our approach outperforms previous methods in balancing efficient throughput and fidelity.

ToDo: 고해상도 이미지의 효율적 생성을 위한 토큰 다운샘플링

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

초록

Support