ChatPaper.aiChatPaper

TensorLens:基于高阶注意力张量的端到端Transformer分析框架

TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors

January 25, 2026
作者: Ido Andrew Atad, Itamar Zimerman, Shahar Katz, Lior Wolf
cs.AI

摘要

注意力矩阵是Transformer研究的核心要素,支撑着包括可解释性、可视化、调控与蒸馏在内的广泛应用。然而现有分析多聚焦于单个注意力头或单层结构,未能反映模型的全局行为。虽然前人研究通过求平均和矩阵乘法扩展了多头注意力公式,或引入了归一化与前馈网络等组件,但仍缺乏能够完整表征所有Transformer模块的统一框架。为此我们提出TensorLens——一种创新性数学表达,将整个Transformer表示为通过高阶注意力交互张量实现的输入依赖型线性算子。该张量联合编码了注意力机制、前馈网络、激活函数、归一化操作和残差连接,为模型计算提供了理论自洽且表达力强的线性表征。TensorLens具有坚实的理论基础,实证验证表明其产生的表征优于现有注意力聚合方法。实验证明该注意力张量可作为开发可解释性与模型理解工具的强大基础。代码已作为补充材料附上。
English
Attention matrices are fundamental to transformer research, supporting a broad range of applications including interpretability, visualization, manipulation, and distillation. Yet, most existing analyses focus on individual attention heads or layers, failing to account for the model's global behavior. While prior efforts have extended attention formulations across multiple heads via averaging and matrix multiplications or incorporated components such as normalization and FFNs, a unified and complete representation that encapsulates all transformer blocks is still lacking. We address this gap by introducing TensorLens, a novel formulation that captures the entire transformer as a single, input-dependent linear operator expressed through a high-order attention-interaction tensor. This tensor jointly encodes attention, FFNs, activations, normalizations, and residual connections, offering a theoretically coherent and expressive linear representation of the model's computation. TensorLens is theoretically grounded and our empirical validation shows that it yields richer representations than previous attention-aggregation methods. Our experiments demonstrate that the attention tensor can serve as a powerful foundation for developing tools aimed at interpretability and model understanding. Our code is attached as a supplementary.
PDF11January 28, 2026