網絡環境下的大型語言模型：資源受限下的協作智慧

摘要

大型語言模型正逐步改變社會，驅動著從智慧型手機助理到自動駕駛等各類應用。然而，僅靠雲端的大型語言模型服務，無法滿足日益增長的新興應用場景，包括需在間歇性連線、亞秒級延遲預算、資料駐留限制，或持續高流量推論等條件下運作的應用。另一方面，在裝置端部署則受到有限的運算與記憶體資源所侷限。沒有任何單一端點能在所有應用情境中同時提供高品質服務。本文聚焦於協作智慧（collaborative intelligence）此一典範，即多個獨立的大型語言模型分散於裝置與雲端端點之間，透過自然語言或結構化訊息在任務層級進行協作。此類協作旨在跨越運算、記憶體、通訊及成本等網路層級的異質資源限制下，追求卓越的回應品質。我們將沿著兩個互補且可組合的維度來探討協作推論：垂直的裝置-雲端協作與水平的多智能體協作，這兩者在實務上可進一步結合成混合拓撲。接著，我們檢視如何學習協作，涵蓋路由策略的訓練以及大型語言模型間協作能力的發展。最後，我們指出未來的研究挑戰，包括在資源異質性下的擴展，以及可信賴的協作智慧。

English

Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraints, or sustained high-volume inference. On-device deployment is in turn constrained by limited computation and memory. No single endpoint can deliver high-quality service across this spectrum. This article focuses on collaborative intelligence, a paradigm in which multiple independent LLMs distributed across device and cloud endpoints collaborate at the task level through natural language or structured messages. Such collaboration strives for superior response quality under heterogeneous resource constraints spanning computation, memory, communication, and cost across network tiers. We present collaborative inference along two complementary and composable dimensions: vertical device-cloud collaboration and horizontal multi-agent collaboration, which can be combined into hybrid topologies in practice. We then examine learning to collaborate, addressing the training of routing policies and the development of cooperative capabilities among LLMs. Finally, we identify open research challenges including scaling under resource heterogeneity and trustworthy collaborative intelligence.

網絡環境下的大型語言模型：資源受限下的協作智慧

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

摘要

Support