见我所见，知我所想：跨异构智能体的密集潜在通信

摘要

多智能体系统主要通过文本进行通信，这会带来有损且昂贵的解码与重新编码开销。KV缓存通信是一种颇具前景的替代方案，但先前的大多数工作都基于同构设置，使用同一模型的重复副本，回避了跨模型潜在对齐这一核心挑战；现有异构方法也存在诸多限制，通常假设共享输入，且传输的缓存主要用于引导。我们研究了一个更根本的问题：异构智能体能否被充分对齐，以实现真正的"读心术"，并传递一个智能体所"看到"的内容及其"思考"方式？我们的信息结构分析揭示了一种对偶性：上下文感知的传递由稀疏的推理信号驱动，而上下文非感知的传递（接收方看不到任何输入）则需要密集的上下文知识保存。基于此，我们提出通过轻量级的跨模型缓存变换和两阶段训练（先重构、后生成）来实现异构KV缓存通信的密集对齐。在Qwen3-4B、8B、14B之间的全部六个传递方向以及六个领域内与领域外基准测试中，我们的方法均优于先前的异构基线，在上下文感知设置中以约2至3倍的计算成本优势达到或超越文本通信的效果，并且在上下文非感知的传递（先前方法完全失效）中仍然保持有效。

English

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.