ChatPaper.aiChatPaper

FEAT:面向医学视频生成的全维度高效注意力Transformer

FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation

June 5, 2025
作者: Huihan Wang, Zhiwen Yang, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu
cs.AI

摘要

合成高质量动态医学视频仍面临重大挑战,这源于需同时建模空间一致性与时间动态性。现有基于Transformer的方法存在关键局限,包括通道交互不足、自注意力机制带来的高计算复杂度,以及处理不同噪声水平时时间步嵌入提供的去噪指导过于粗糙。本研究中,我们提出了FEAT,一种全维度高效注意力Transformer,通过三项关键创新解决上述问题:(1) 采用序列化空间-时间-通道注意力机制的统一范式,以捕捉所有维度上的全局依赖关系;(2) 在各维度上设计线性复杂度的注意力机制,利用加权键值注意力与全局通道注意力;(3) 引入残差值指导模块,提供细粒度像素级指导,以适应不同噪声水平。我们在标准基准测试及下游任务上评估FEAT,结果表明,仅拥有Endora这一当前最优模型23%参数的FEAT-S,实现了相当甚至更优的性能。此外,FEAT-L在多个数据集上超越所有对比方法,展现了卓越的有效性与可扩展性。代码已发布于https://github.com/Yaziwel/FEAT。
English
Synthesizing high-quality dynamic medical videos remains a significant challenge due to the need for modeling both spatial consistency and temporal dynamics. Existing Transformer-based approaches face critical limitations, including insufficient channel interactions, high computational complexity from self-attention, and coarse denoising guidance from timestep embeddings when handling varying noise levels. In this work, we propose FEAT, a full-dimensional efficient attention Transformer, which addresses these issues through three key innovations: (1) a unified paradigm with sequential spatial-temporal-channel attention mechanisms to capture global dependencies across all dimensions, (2) a linear-complexity design for attention mechanisms in each dimension, utilizing weighted key-value attention and global channel attention, and (3) a residual value guidance module that provides fine-grained pixel-level guidance to adapt to different noise levels. We evaluate FEAT on standard benchmarks and downstream tasks, demonstrating that FEAT-S, with only 23\% of the parameters of the state-of-the-art model Endora, achieves comparable or even superior performance. Furthermore, FEAT-L surpasses all comparison methods across multiple datasets, showcasing both superior effectiveness and scalability. Code is available at https://github.com/Yaziwel/FEAT.
PDF31June 6, 2025