ChatPaper.aiChatPaper

AdaptToken:基于熵的自适应令牌选择方法在MLLM长视频理解中的应用

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

March 30, 2026
作者: Haozhe Qi, Kevin Qu, Mahdi Rad, Rui Wang, Alexander Mathis, Marc Pollefeys
cs.AI

摘要

长视频理解因高内存消耗和上下文长度限制,始终是多模态大语言模型面临的核心挑战。现有方法通过短片段内的帧/令牌评分选择来缓解该问题,但缺乏系统性机制以(i)比较远距离视频片段间的相关性,以及(ii)在收集到充分证据后及时终止处理。我们提出AdaptToken——一种免训练框架,将MLLM的自我不确定性转化为长视频令牌选择的全局控制信号。该框架将视频分割为组块,通过跨模态注意力机制对组内令牌排序,并利用模型响应熵评估各组与提示词的相关性。熵信号支持全局令牌预算的动态分配,并进一步实现早停机制(AdaptToken-Lite):当模型达到足够置信度时跳过剩余组块处理。在四个长视频基准数据集(VideoMME、LongVideoBench、LVBench和MLVU)及多种基座MLLM(7B-72B)上的实验表明,AdaptToken持续提升准确率(如在Qwen2.5-VL 7B模型上平均提升6.7分),并能有效利用超长输入(最高达1万帧);而AdaptToken-Lite在保持相当性能的同时,将推理时间缩减约一半。项目页面:https://haozheqi.github.io/adapt-token
English
Long video understanding remains challenging for Multi-modal Large Language Models (MLLMs) due to high memory costs and context-length limits. Prior approaches mitigate this by scoring and selecting frames/tokens within short clips, but they lack a principled mechanism to (i) compare relevance across distant video clips and (ii) stop processing once sufficient evidence has been gathered. We propose AdaptToken, a training-free framework that turns an MLLM's self-uncertainty into a global control signal for long-video token selection. AdaptToken splits a video into groups, extracts cross-modal attention to rank tokens within each group, and uses the model's response entropy to estimate each group's prompt relevance. This entropy signal enables a global token budget allocation across groups and further supports early stopping (AdaptToken-Lite), skipping the remaining groups when the model becomes sufficiently certain. Across four long-video benchmarks (VideoMME, LongVideoBench, LVBench, and MLVU) and multiple base MLLMs (7B-72B), AdaptToken consistently improves accuracy (e.g., +6.7 on average over Qwen2.5-VL 7B) and continues to benefit from extremely long inputs (up to 10K frames), while AdaptToken-Lite reduces inference time by about half with comparable performance. Project page: https://haozheqi.github.io/adapt-token
PDF31April 1, 2026