网络环境下的大型语言模型：资源约束下的协同智能

摘要

大型语言模型（LLMs）正在重塑社会，驱动从智能手机助手到自动驾驶等各类应用。然而，仅靠基于云端的LLM服务已无法满足日益增长的应用需求，这些应用包括间歇性网络连接、亚秒级延迟要求、数据驻留约束或持续高吞吐量推理等场景。设备端部署则受限于有限的计算和存储能力。没有任何单一终端能在如此广泛的场景中提供高质量服务。本文聚焦于协作智能这一范式，其中分布在设备与云端多个终端的独立LLM通过自然语言或结构化消息在任务层面进行协作。这种协作旨在异构资源约束下——涵盖计算、存储、通信及网络层级的成本——实现更优的响应质量。我们沿两个互补且可组合的维度呈现协作推理：纵向设备-云端协作与横向多智能体协作，在实际应用中可组合为混合拓扑结构。进而探讨协作学习，涉及路由策略的训练以及LLM间协作能力的培养。最后，我们指出开放性研究挑战，包括资源异构性下的扩展问题及可信赖的协作智能。

English

Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraints, or sustained high-volume inference. On-device deployment is in turn constrained by limited computation and memory. No single endpoint can deliver high-quality service across this spectrum. This article focuses on collaborative intelligence, a paradigm in which multiple independent LLMs distributed across device and cloud endpoints collaborate at the task level through natural language or structured messages. Such collaboration strives for superior response quality under heterogeneous resource constraints spanning computation, memory, communication, and cost across network tiers. We present collaborative inference along two complementary and composable dimensions: vertical device-cloud collaboration and horizontal multi-agent collaboration, which can be combined into hybrid topologies in practice. We then examine learning to collaborate, addressing the training of routing policies and the development of cooperative capabilities among LLMs. Finally, we identify open research challenges including scaling under resource heterogeneity and trustworthy collaborative intelligence.

网络环境下的大型语言模型：资源约束下的协同智能

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

摘要

Support