다중 에이전트 추론에서의 스트리밍 통신

초록

다중 에이전트 추론 시스템은 "생성 후 전송(generate-then-transfer)" 패러다임을 채택하여 종단 간 지연시간이 파이프라인 깊이에 비례하여 선형적으로 증가하도록 강제한다. 우리는 각 추론 단계가 생성되는 즉시 하류 에이전트로 스트리밍하여 인접 에이전트 간 파이프라이닝을 수행함으로써 지연시간을 줄이는 다중 에이전트 추론 시스템인 StreamMA를 제안한다. 놀랍게도, 이 파이프라이닝은 효과성(effectiveness)까지 향상시킨다. 다단계 추론 품질은 균일하지 않으며 초기 단계가 후기 단계보다 더 신뢰할 수 있기 때문에, 전체 체인 대신 이러한 신뢰할 수 있는 초기 단계를 사용하면 오류 발생 가능성이 높은 후기 단계가 하류 에이전트를 오도하는 것을 방지할 수 있다. 우리는 스트림, 직렬, 단일 프로토콜에 대한 최초의 닫힌 형태 결합 분석(closed-form joint analysis)을 통해 두 가지 이점을 공식화하여 효과성 순서, 속도 향상 상한, 비용 비율을 도출한다. 수학, 과학, 코드를 포괄하는 8개의 추론 벤치마크, 두 개의 최첨단 LLM(Claude Opus 4.6 및 GPT-5.4), 그리고 세 가지 토폴로지(체인, 트리, 그래프)에 걸쳐 StreamMA는 두 기준 모델보다 우수한 성능을 보였다(HMMT 2026에서 평균 +7.3%p, 최대 +22.4%p, Claude Opus 4.6-high 기준). 이러한 기여 외에도, 우리는 "단계 수준 스케일링 법칙(step-level scaling law)"을 발견한다. 즉, 에이전트당 단계 수를 증가시키면 효과성과 효율성이 모두 일관되게 향상되며, 이는 에이전트 수 스케일링과 직교하고 조합 가능한 새로운 스케일링 차원이다.

English

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.