重新路由,而非移除:視覺語言模型的可恢復視覺標記路由
Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
June 10, 2026
作者: Cheng-Yu Yang, Shao-Yuan Lo, Yu-Lun Liu
cs.AI
摘要
視覺語言模型(VLM)將影像投影為數百至數千個視覺令牌,使得解碼器在注意力計算與KV快取記憶體上的推論成本高昂。現有的視覺令牌縮減方法多遵循「排序與捨棄」範式:對視覺令牌評分、保留精簡子集、並永久丟棄其餘部分。我們證明此不可逆操作相當脆弱,因為視覺令牌的重要性會隨解碼器深度而改變;在某一階段排名較低的令牌,可能在后續層級中變得相關,尤其是對接地敏感查詢而言。為此,我們提出Reroute,一種無需訓練的外掛模組,將捨棄改為可恢復的路由。在每個路由階段,被選中的視覺令牌通過解碼器區塊,而被延遲的令牌則跳過該階段,並在下一個路由決策時重新進入候選池。Reroute復用現有的注意力評分排序規則與逐階段排程,維持其所增強之剪枝方法的理論TFLOPs與KV快取預算類別。在以LLaVA-1.5與Qwen為骨幹的FastV、PDrop及Nüwa變體上,Reroute在激進的令牌縮減下仍能改善接地表現,同時維持一般VQA任務效能。這些結果表明,VLM令牌縮減不應僅視為不可逆剪枝,亦可視為可恢復路由。程式碼可於此處取得:https://github.com/elmma/mllm-reroute/
English
Vision-language models (VLMs) project images into hundreds to thousands of visual tokens, making decoder inference expensive in both attention computation and KV-cache memory. Existing visual-token reduction methods largely follow a rank-and-remove paradigm: they score visual tokens, keep a compact subset, and permanently discard the rest. We show that this irreversible action is fragile because visual-token importance changes across decoder depth; tokens ranked low at one stage may become relevant in later layers, especially for grounding-sensitive queries. We propose Reroute, a training-free plug-in that replaces removal with recoverable routing. At each routing stage, selected vision tokens pass through decoder blocks, while deferred tokens bypass the stage and re-enter the candidate pool at the next routing decision. Reroute reuses existing attention-score ranking rules and stage-wise schedules, preserving the theoretical TFLOPs and KV-cache budget class of the pruning method it augments. Across FastV, PDrop, and Nüwa variants on LLaVA-1.5 and Qwen backbones, reroute improves grounding under aggressive token reduction while maintaining general VQA performance. These results suggest that VLM token reduction should not be viewed only as irreversible pruning, but also as recoverable routing. The code can be found here: https://github.com/elmma/mllm-reroute/