大型語言模型與誘導小型代理的傳奇：知識挖掘的可擴展代理

摘要

深度研究的核心在於知識挖掘，這是一項從海量非結構化文本中提取結構化信息以響應用戶指令的任務。大型語言模型（LLMs）在解讀此類指令方面表現卓越，但大規模部署成本過高；而傳統的分類器和提取器管道雖然高效，卻脆弱且無法泛化至新任務。我們引入了Falconer，這是一個協作框架，它結合了LLMs的代理推理能力與輕量級代理模型，實現可擴展的知識挖掘。在Falconer中，LLMs充當規劃者，將用戶指令分解為可執行的管道，並作為註釋者，生成監督數據來訓練小型代理模型。該框架將分類和提取統一為兩個原子操作——獲取標籤和獲取跨度，使得單一的指令跟隨模型能夠替代多個特定任務組件。為了評估由Falconer孵化的代理模型與人類及大型模型提供的註釋之間的一致性，我們構建了涵蓋規劃和端到端執行的新基準。實驗表明，Falconer在指令跟隨準確性上緊密匹配最先進的LLMs，同時將推理成本降低高達90%，並加速大規模知識挖掘超過20倍，為深度研究提供了高效且可擴展的基礎。

English

At the core of Deep Research is knowledge mining, the task of extracting structured information from massive unstructured text in response to user instructions. Large language models (LLMs) excel at interpreting such instructions but are prohibitively expensive to deploy at scale, while traditional pipelines of classifiers and extractors remain efficient yet brittle and unable to generalize to new tasks. We introduce Falconer, a collaborative framework that combines the agentic reasoning of LLMs with lightweight proxy models for scalable knowledge mining. In Falconer, LLMs act as planners, decomposing user instructions into executable pipelines, and as annotators, generating supervision to train small proxies. The framework unifies classification and extraction into two atomic operations, get label and get span, enabling a single instruction-following model to replace multiple task-specific components. To evaluate the consistency between proxy models incubated by Falconer and annotations provided by humans and large models, we construct new benchmarks covering both planning and end-to-end execution. Experiments show that Falconer closely matches state-of-the-art LLMs in instruction-following accuracy while reducing inference cost by up to 90% and accelerating large-scale knowledge mining by more than 20x, offering an efficient and scalable foundation for Deep Research.

大型語言模型與誘導小型代理的傳奇：知識挖掘的可擴展代理

A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining

摘要

Support