ChatPaper.aiChatPaper

FEAT:面向医学视频生成的全维度高效注意力Transformer

FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation

June 5, 2025
作者: Huihan Wang, Zhiwen Yang, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu
cs.AI

摘要

合成高品質的動態醫學視頻仍是一項重大挑戰,這主要源於需要同時建模空間一致性和時間動態性。現有的基於Transformer的方法面臨關鍵限制,包括通道交互不足、自注意力機制帶來的高計算複雜度,以及在處理不同噪聲水平時,時間步嵌入提供的去噪指導過於粗糙。在本研究中,我們提出了FEAT,一種全維度高效注意力Transformer,通過三項關鍵創新解決了這些問題:(1)採用序列化的空間-時間-通道注意力機制統一範式,以捕捉所有維度上的全局依賴關係;(2)為每個維度的注意力機制設計線性複雜度方案,利用加權鍵值注意力和全局通道注意力;(3)引入殘差值指導模塊,提供細粒度的像素級指導,以適應不同的噪聲水平。我們在標準基準測試和下游任務上對FEAT進行了評估,結果表明,僅擁有最先進模型Endora 23%參數的FEAT-S,達到了與之相當甚至更優的性能。此外,FEAT-L在多個數據集上超越了所有對比方法,展示了其卓越的有效性和可擴展性。代碼已公開於https://github.com/Yaziwel/FEAT。
English
Synthesizing high-quality dynamic medical videos remains a significant challenge due to the need for modeling both spatial consistency and temporal dynamics. Existing Transformer-based approaches face critical limitations, including insufficient channel interactions, high computational complexity from self-attention, and coarse denoising guidance from timestep embeddings when handling varying noise levels. In this work, we propose FEAT, a full-dimensional efficient attention Transformer, which addresses these issues through three key innovations: (1) a unified paradigm with sequential spatial-temporal-channel attention mechanisms to capture global dependencies across all dimensions, (2) a linear-complexity design for attention mechanisms in each dimension, utilizing weighted key-value attention and global channel attention, and (3) a residual value guidance module that provides fine-grained pixel-level guidance to adapt to different noise levels. We evaluate FEAT on standard benchmarks and downstream tasks, demonstrating that FEAT-S, with only 23\% of the parameters of the state-of-the-art model Endora, achieves comparable or even superior performance. Furthermore, FEAT-L surpasses all comparison methods across multiple datasets, showcasing both superior effectiveness and scalability. Code is available at https://github.com/Yaziwel/FEAT.
PDF31June 6, 2025