帕奇的崩溃
The Collapse of Patches
November 27, 2025
作者: Wei Guo, Shunqi Mao, Zhuonan Liang, Heng Wang, Weidong Cai
cs.AI
摘要
观察图像中的某些区块会降低其他区块的不确定性。这些区块的显现实质上降低了其余每个区块特征分布的熵值,类似于量子力学中粒子波函数的坍缩。这一现象可直观地称为"区块坍缩"。为识别目标区域坍缩过程中最依赖的区块,我们训练了一种能软性选择区块子集以重建每个目标区块的自编码器。通过绘制每个区块PageRank得分所对应的学习依赖关系,可揭示实现图像重构的最优区块顺序。实验表明遵循该顺序能提升多种掩码图像建模方法的性能:首先,通过重新训练最先进的自回归模型MAR可提升图像生成效果;其次,我们提出一种新的图像分类方案,仅向视觉Transformer暴露坍缩顺序中高排名的区块。实验证明仅观察22%的高排名区块即可实现高精度分类。通过这些实验,我们提出以区块坍缩作为新型图像建模视角,有效提升视觉任务效率。本项目代码已开源:https://github.com/wguo-ai/CoP。
English
Observing certain patches in an image reduces the uncertainty of others. Their realization lowers the distribution entropy of each remaining patch feature, analogous to collapsing a particle's wave function in quantum mechanics. This phenomenon can intuitively be called patch collapse. To identify which patches are most relied on during a target region's collapse, we learn an autoencoder that softly selects a subset of patches to reconstruct each target patch. Graphing these learned dependencies for each patch's PageRank score reveals the optimal patch order to realize an image. We show that respecting this order benefits various masked image modeling methods. First, autoregressive image generation can be boosted by retraining the state-of-the-art model MAR. Next, we introduce a new setup for image classification by exposing Vision Transformers only to high-rank patches in the collapse order. Seeing 22\% of such patches is sufficient to achieve high accuracy. With these experiments, we propose patch collapse as a novel image modeling perspective that promotes vision efficiency. Our project is available at https://github.com/wguo-ai/CoP .