制约代理系统效能的因素有哪些？

摘要

大型语言模型（LLMs），如OpenAI-o1与DeepSeek-R1，已展现出卓越的推理能力。为进一步提升LLM性能，近期诸如深度研究等代理系统，将网络交互融入LLM推理过程，以降低不确定性并减少潜在错误。然而，现有研究多聚焦于推理效能，常忽视代理系统的效率问题。本研究通过一项全面的实证分析，揭示了网络交互型代理系统中的效率瓶颈。我们将端到端延迟分解为两大主要部分：LLM API延迟与网络环境延迟。通过对15种模型及5家供应商的广泛实证研究，我们发现基于API的代理系统存在高度变异性。特别地，网络环境延迟在基于网络的代理系统中可占总延迟的53.7%。为优化延迟，我们提出了SpecCache，一个结合了推测执行的缓存框架，旨在减少网络环境开销。在两项标准基准测试上的大量评估表明，相较于随机缓存策略，我们的方法将缓存命中率提升至最高58倍，同时将网络环境开销降低至最多3.2倍，且未损害代理系统的性能。

English

Large Language Models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated strong reasoning capabilities. To further enhance LLM capabilities, recent agentic systems, such as Deep Research, incorporate web interactions into LLM reasoning to mitigate uncertainties and reduce potential errors. However, existing research predominantly focuses on reasoning performance, often neglecting the efficiency of agentic systems. In this work, we present a comprehensive empirical study that identifies efficiency bottlenecks in web-interactive agentic systems. We decompose end-to-end latency into two primary components: LLM API latency and web environment latency. We conduct a comprehensive empirical study across 15 models and 5 providers to demonstrate high variability in API-based agentic systems. We observe that web environment latency can contribute as much as 53.7% to the overall latency in a web-based agentic system. To improve latency, we propose SpecCache, a caching framework augmented with speculative execution that can reduce web environment overhead. Extensive evaluations on two standard benchmarks show that our approach improves the cache hit rate by up to 58x compared to a random caching strategy, while reducing web environment overhead by up to 3.2x, without degrading agentic system performance.

制约代理系统效能的因素有哪些？

What Limits Agentic Systems Efficiency?

摘要

Support