에이전트 시스템 설계에 대한 정보 이론적 관점

초록

에이전트 언어 모델(LM) 시스템은 "딥 리서치"와 "클로드 코드"와 같은 현대 애플리케이션의 핵심 동력으로, 다중 LM 아키텍처를 활용하여 컨텍스트 제한을 극복합니다. 겉보기에 다양한 이 시스템들 내부에는 반복되는 패턴이 존재합니다: 더 작은 "압축기" LM(로컬에서도 실행 가능)이 원시 컨텍스트를 간결한 텍스트로 정제하면, 더 큰 "예측기" LM이 이를 소비하는 방식입니다. 이러한 시스템의 인기에도 불구하고, 압축기-예측기 시스템의 설계는 대체로 임시적인 방식에 머물러 있으며, 압축기와 예측기 선택이 하류 작업 성능에 미치는 영향에 대한 체계적인 지침은 부족한 실정입니다. 실제로 성능 향상이 압축 과정에서 비롯된 것인지 예측 과정에서 비롯된 것인지를 규명하려면 비용이 많이 드는 작업별 쌍대 비교 분석이 필요합니다. 우리는 이러한 에이전트 시스템 설계 문제가 근본적으로 정보 이론적 관점에서 접근해야 함을 주장합니다. 압축기 LM을 잡음이 있는 채널로 간주하고, 컨텍스트와 그 압축 결과 간의 상호 정보량을 작업 독립적인 방식으로 압축 품질을 정량화하는 간단한 추정기를 도입했습니다. 우리는 이 상호 정보량이 특정 작업에 관계없이 하류 성능을 강력하게 예측함을 보여줍니다. 정보 이론적 프레임워크를 통해 5개의 데이터셋과 3개의 모델 패밀리에 걸쳐 포괄적인 실증 분석을 수행했습니다. 결과에 따르면, 더 큰 압축기는 정확도가 높을 뿐만 아니라 토큰 효율성도 더 뛰어나, 토큰당 더 많은 정보 비트를 전달합니다. 예를 들어, 7B 규모의 Qwen-2.5 압축기는 1.5B 규모의 동종 모델 대비 1.6배 더 정확하고, 4.6배 더 간결하며, 토큰당 5.5배 더 많은 상호 정보량 비트를 전달합니다. 다양한 데이터셋에서, 예측기의 규모를 키우는 것보다 압축기의 규모를 키우는 것이 성능 향상에 훨씬 더 효과적이며, 이는 더 큰 온디바이스 압축기가 더 작은 클라우드 예측기와 pairing될 수 있도록 합니다. 딥 리서치 시스템에 적용했을 때, 이러한 원칙을 통해 3B 파라미터 규모의 소형 로컬 압축기만으로도 최첨단 LM 정확도의 99%를 API 비용의 26% 수준으로 회복할 수 있었습니다.

English

Agentic language model (LM) systems power modern applications like "Deep Research" and "Claude Code," and leverage multi-LM architectures to overcome context limitations. Beneath their apparent diversity lies a recurring pattern: smaller "compressor" LMs (that can even run locally) distill raw context into compact text that is then consumed by larger "predictor" LMs. Despite their popularity, the design of compressor-predictor systems remains largely ad hoc, with little guidance on how compressor and predictor choices shape downstream performance. In practice, attributing gains to compression versus prediction requires costly, task-specific pairwise sweeps. We argue that these agentic system design questions are, at root, information-theoretic. Viewing the compressor LM as a noisy channel, we introduce a simple estimator of mutual information between the context and its compression to quantify compression quality in a task-independent way. We show that mutual information strongly predicts downstream performance, independent of any specific task. Through an information-theoretic framework, we perform a comprehensive empirical analysis across five datasets and three model families. Results reveal that larger compressors not only are more accurate, but also more token-efficient, conveying more bits of information per token. A 7B Qwen-2.5 compressor, for instance, is 1.6times more accurate, 4.6times more concise, and conveys 5.5times more bits of mutual information per token than its 1.5B sibling. Across datasets, scaling compressors is substantially more effective than scaling predictors, enabling larger on-device compressors to pair with smaller cloud predictors. Applied to a Deep Research system, these principles enable local compressors as small as 3B parameters to recover 99% of frontier-LM accuracy at 26% of API costs.

에이전트 시스템 설계에 대한 정보 이론적 관점

An Information Theoretic Perspective on Agentic System Design

초록

Support