내가 보는 것을 보고, 내가 생각하는 것을 알라: 이종 에이전트 간의 조밀한 잠재 통신

초록

멀티 에이전트 시스템은 주로 텍스트를 통해 통신하며, 이는 손실이 크고 비용이 많이 드는 디코딩 및 재인코딩 비용을 수반한다. KV-캐시 통신은 유망한 대안이지만, 대부분의 기존 연구는 동일한 모델의 복제본을 사용하는 동질적 환경에 국한되어 있어 교차 모델 잠재 정렬이라는 핵심 과제를 회피한다. 기존의 이질적 방법 역시 제한적이어서, 일반적으로 공유 입력을 가정하고 전달된 캐시를 주로 스티어링(steering) 용도로만 사용한다. 본 연구는 보다 근본적인 질문을 탐구한다. 이질적 에이전트들이 실제 '마인드 리딩'을 수행할 수 있을 만큼 충분히 정렬되어, 한 에이전트가 보는 것과 그 에이전트가 생각하는 방식을 모두 전달할 수 있는가? 우리의 정보 구조 분석은 이중성을 밝혀낸다. 컨텍스트 인식 전송은 희소한 추론 신호에 의해 구동되는 반면, 수신자가 입력을 전혀 보지 못하는 컨텍스트 비인식 전송은 밀집된 컨텍스트 지식 보존을 필요로 한다. 이에 착안하여, 우리는 경량의 교차 모델 캐시 변환과 재구성 후 생성의 2단계 학습을 통해 이질적 KV-캐시 통신을 위한 밀집 정렬 방법을 제안한다. {Qwen3-4B, 8B, 14B}의 여섯 방향 전부와 여섯 개의 도메인 내 및 도메인 외 벤치마크에서, 우리 방법은 기존 이질적 기준선을 능가하며, 컨텍스트 인식 환경에서는 텍스트 통신과 동등하거나 더 나은 성능을 약 2~3배 낮은 계산 비용으로 달성하고, 기존 방법이 붕괴하는 컨텍스트 비인식 전송에서도 효과적으로 작동한다.

English

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.