네트워크 기반 대규모 언어 모델: 자원 제약 하의 협업 인텔리전스

초록

대규모 언어 모델(LLMs)은 스마트폰 비서부터 자율주행까지 다양한 애플리케이션을 구동하며 사회를 변화시키고 있다. 그러나 클라우드 기반 LLM 서비스만으로는 간헐적 연결, 서브초 미만의 지연 시간 예산, 데이터 거주 제약, 또는 지속적인 대용량 추론 환경에서 운영되는 점점 더 많은 애플리케이션을 지원할 수 없다. 기기 내 배포는 제한된 연산 및 메모리 자원에 의해 제약을 받는다. 단일 엔드포인트로는 이러한 다양한 스펙트럼 전반에 걸쳐 고품질 서비스를 제공할 수 없다. 본 논문은 복수의 독립적인 LLM들이 디바이스와 클라우드 엔드포인트에 분산되어 자연어 또는 구조화된 메시지를 통해 작업 수준에서 협업하는 패러다임인 협업 지능에 초점을 맞춘다. 이러한 협업은 네트워크 계층 전반의 연산, 메모리, 통신 및 비용을 아우르는 이질적 자원 제약 하에서 우수한 응답 품질을 달성하고자 한다. 우리는 협업 추론을 상호 보완적이고 조합 가능한 두 가지 차원, 즉 수직적 디바이스-클라우드 협업과 수평적 다중 에이전트 협업으로 제시하며, 이들은 실제로 하이브리드 토폴로지로 결합될 수 있다. 그런 다음 라우팅 정책 훈련과 LLM 간 협업 능력 개발을 다루는 협업 학습을 살펴본다. 마지막으로, 자원 이질성 하에서의 확장 및 신뢰할 수 있는 협업 지능을 포함한 공개 연구 과제를 식별한다.

English

Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraints, or sustained high-volume inference. On-device deployment is in turn constrained by limited computation and memory. No single endpoint can deliver high-quality service across this spectrum. This article focuses on collaborative intelligence, a paradigm in which multiple independent LLMs distributed across device and cloud endpoints collaborate at the task level through natural language or structured messages. Such collaboration strives for superior response quality under heterogeneous resource constraints spanning computation, memory, communication, and cost across network tiers. We present collaborative inference along two complementary and composable dimensions: vertical device-cloud collaboration and horizontal multi-agent collaboration, which can be combined into hybrid topologies in practice. We then examine learning to collaborate, addressing the training of routing policies and the development of cooperative capabilities among LLMs. Finally, we identify open research challenges including scaling under resource heterogeneity and trustworthy collaborative intelligence.

네트워크 기반 대규모 언어 모델: 자원 제약 하의 협업 인텔리전스

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

초록

Support