大语言模型与诱导小型代理的传奇：面向知识挖掘的可扩展智能体

摘要

深度研究的核心在于知识挖掘，即从海量非结构化文本中提取结构化信息以响应用户指令。大型语言模型（LLMs）在解读此类指令方面表现出色，但大规模部署成本高昂；而传统的分类器和提取器管道虽保持高效，却脆弱且难以泛化至新任务。我们引入Falconer，一个协作框架，它将LLMs的代理推理与轻量级代理模型相结合，实现可扩展的知识挖掘。在Falconer中，LLMs充当规划者，将用户指令分解为可执行的管道，并作为标注者，生成监督数据以训练小型代理。该框架将分类与提取统一为两个原子操作——获取标签和获取跨度，使得单一指令跟随模型能够替代多个特定任务组件。为评估Falconer孵化的代理模型与人类及大型模型提供的标注之间的一致性，我们构建了涵盖规划和端到端执行的新基准。实验表明，Falconer在指令跟随准确性上紧追最先进的LLMs，同时将推理成本降低高达90%，并加速大规模知识挖掘超过20倍，为深度研究提供了高效且可扩展的基础。

English

At the core of Deep Research is knowledge mining, the task of extracting structured information from massive unstructured text in response to user instructions. Large language models (LLMs) excel at interpreting such instructions but are prohibitively expensive to deploy at scale, while traditional pipelines of classifiers and extractors remain efficient yet brittle and unable to generalize to new tasks. We introduce Falconer, a collaborative framework that combines the agentic reasoning of LLMs with lightweight proxy models for scalable knowledge mining. In Falconer, LLMs act as planners, decomposing user instructions into executable pipelines, and as annotators, generating supervision to train small proxies. The framework unifies classification and extraction into two atomic operations, get label and get span, enabling a single instruction-following model to replace multiple task-specific components. To evaluate the consistency between proxy models incubated by Falconer and annotations provided by humans and large models, we construct new benchmarks covering both planning and end-to-end execution. Experiments show that Falconer closely matches state-of-the-art LLMs in instruction-following accuracy while reducing inference cost by up to 90% and accelerating large-scale knowledge mining by more than 20x, offering an efficient and scalable foundation for Deep Research.

大语言模型与诱导小型代理的传奇：面向知识挖掘的可扩展智能体

A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining

摘要

Support