见我所见,知我所想:跨异构智能体的密集潜在通信
See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
June 11, 2026
作者: Siyi Chen, Xiaoyan Zhang, Meng Wu, Jonathan Tremblay, Valts Blukis, Stan Birchfield, Rene Vidal, Alvaro Velasquez, Sijia Liu, Qing Qu
cs.AI
摘要
多智能体系统主要通过文本进行通信,这会带来有损且昂贵的解码与重新编码开销。KV缓存通信是一种颇具前景的替代方案,但先前的大多数工作都基于同构设置,使用同一模型的重复副本,回避了跨模型潜在对齐这一核心挑战;现有异构方法也存在诸多限制,通常假设共享输入,且传输的缓存主要用于引导。我们研究了一个更根本的问题:异构智能体能否被充分对齐,以实现真正的"读心术",并传递一个智能体所"看到"的内容及其"思考"方式?我们的信息结构分析揭示了一种对偶性:上下文感知的传递由稀疏的推理信号驱动,而上下文非感知的传递(接收方看不到任何输入)则需要密集的上下文知识保存。基于此,我们提出通过轻量级的跨模型缓存变换和两阶段训练(先重构、后生成)来实现异构KV缓存通信的密集对齐。在Qwen3-4B、8B、14B之间的全部六个传递方向以及六个领域内与领域外基准测试中,我们的方法均优于先前的异构基线,在上下文感知设置中以约2至3倍的计算成本优势达到或超越文本通信的效果,并且在上下文非感知的传递(先前方法完全失效)中仍然保持有效。
English
Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.