ChatPaper.aiChatPaper

AgentOCR:通过光学自压缩重构智能体历史

AgentOCR: Reimagining Agent History via Optical Self-Compression

January 8, 2026
作者: Lang Feng, Fuchao Yang, Feng Chen, Xin Cheng, Haiyang Xu, Zhenglin Wan, Ming Yan, Bo An
cs.AI

摘要

近期大语言模型(LLM)的进展使得基于多轮交互轨迹强化学习(RL)训练的智能体系统成为可能,但实际部署受限于快速增长的文本历史导致的令牌预算与内存占用激增。我们提出AgentOCR框架,通过将累积的观察-行动历史渲染为紧凑图像,利用视觉令牌更高的信息密度。为实现可扩展的多轮推演,AgentOCR提出分段光学缓存机制——通过将历史分解为可哈希的片段并维护视觉缓存,消除了冗余的重复渲染。除固定渲染外,AgentOCR引入智能体自压缩技术:智能体主动输出压缩率,并通过压缩感知奖励训练,自适应平衡任务成功率与令牌效率。我们在ALFWorld和基于搜索的问答等挑战性智能体基准测试中开展广泛实验。显著成果表明,AgentOCR在保留文本智能体95%以上性能的同时,大幅降低令牌消耗(>50%),实现稳定的令牌与内存效率提升。进一步分析验证了分段光学缓存带来20倍渲染加速,以及自压缩策略的有效平衡性。
English
Recent advances in large language models (LLMs) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked by rapidly growing textual histories that inflate token budgets and memory usage. We introduce AgentOCR, a framework that exploits the superior information density of visual tokens by representing the accumulated observation-action history as a compact rendered image. To make multi-turn rollouts scalable, AgentOCR proposes segment optical caching. By decomposing history into hashable segments and maintaining a visual cache, this mechanism eliminates redundant re-rendering. Beyond fixed rendering, AgentOCR introduces agentic self-compression, where the agent actively emits a compression rate and is trained with compression-aware reward to adaptively balance task success and token efficiency. We conduct extensive experiments on challenging agentic benchmarks, ALFWorld and search-based QA. Remarkably, results demonstrate that AgentOCR preserves over 95\% of text-based agent performance while substantially reducing token consumption (>50\%), yielding consistent token and memory efficiency. Our further analysis validates a 20x rendering speedup from segment optical caching and the effective strategic balancing of self-compression.
PDF181January 13, 2026