ChatPaper.aiChatPaper

ε-VAE:去噪作為視覺解碼

ε-VAE: Denoising as Visual Decoding

October 5, 2024
作者: Long Zhao, Sanghyun Woo, Ziyu Wan, Yandong Li, Han Zhang, Boqing Gong, Hartwig Adam, Xuhui Jia, Ting Liu
cs.AI

摘要

在生成建模中,標記化將複雜的數據簡化為緊湊、結構化的表示,創建一個更有效、可學習的空間。對於高維視覺數據,它減少了冗餘並強調關鍵特徵,以實現高質量生成。當前的視覺標記化方法依賴於傳統的自編碼器框架,其中編碼器將數據壓縮為潛在表示,解碼器則重建原始輸入。在這項工作中,我們提出了一個新的觀點,提出將去噪作為解碼,從單步重建轉向迭代細化。具體而言,我們用擴散過程取代解碼器,通過編碼器提供的潛在信息引導逐步細化噪聲以恢復原始圖像。我們通過評估重建(rFID)和生成質量(FID)來評估我們的方法,並將其與最先進的自編碼方法進行比較。我們希望這項工作能為整合迭代生成和自編碼以改進壓縮和生成提供新的見解。
English
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space. For high-dimensional visual data, it reduces redundancy and emphasizes key features for high-quality generation. Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input. In this work, we offer a new perspective by proposing denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder. We evaluate our approach by assessing both reconstruction (rFID) and generation quality (FID), comparing it to state-of-the-art autoencoding approach. We hope this work offers new insights into integrating iterative generation and autoencoding for improved compression and generation.

Summary

AI-Generated Summary

PDF72November 16, 2024