開放深度搜索：以開源推理代理實現搜索民主化

摘要

我們推出開放深度搜索（Open Deep Search, ODS），旨在縮小專有搜索AI解決方案（如Perplexity的Sonar Reasoning Pro和OpenAI的GPT-4o Search Preview）與其開源對應方案之間日益擴大的差距。ODS的主要創新在於，通過能夠明智使用網絡搜索工具來回答查詢的推理代理，增強了最新開源大型語言模型（LLMs）的推理能力。具體而言，ODS由兩個與用戶選擇的基礎LLM協同工作的組件組成：開放搜索工具（Open Search Tool）和開放推理代理（Open Reasoning Agent）。開放推理代理解釋給定的任務，並通過協調一系列動作（包括調用工具，其中之一便是開放搜索工具）來完成任務。開放搜索工具是一種新穎的網絡搜索工具，其性能超越專有對應方案。結合強大的開源推理LLMs，如DeepSeek-R1，ODS在兩個基準測試（SimpleQA和FRAMES）上幾乎匹配並有時超越現有的最先進基線。例如，在FRAMES評估基準上，ODS將最近發布的GPT-4o Search Preview的最佳現有基線準確率提高了9.7%。ODS是一個通用框架，可無縫增強任何LLMs（例如，在SimpleQA上達到82.4%、在FRAMES上達到30.1%的DeepSeek-R1）的搜索和推理能力，以實現最先進的性能：在SimpleQA上達到88.3%，在FRAMES上達到75.3%。

English

We introduce Open Deep Search (ODS) to close the increasing gap between the proprietary search AI solutions, such as Perplexity's Sonar Reasoning Pro and OpenAI's GPT-4o Search Preview, and their open-source counterparts. The main innovation introduced in ODS is to augment the reasoning capabilities of the latest open-source LLMs with reasoning agents that can judiciously use web search tools to answer queries. Concretely, ODS consists of two components that work with a base LLM chosen by the user: Open Search Tool and Open Reasoning Agent. Open Reasoning Agent interprets the given task and completes it by orchestrating a sequence of actions that includes calling tools, one of which is the Open Search Tool. Open Search Tool is a novel web search tool that outperforms proprietary counterparts. Together with powerful open-source reasoning LLMs, such as DeepSeek-R1, ODS nearly matches and sometimes surpasses the existing state-of-the-art baselines on two benchmarks: SimpleQA and FRAMES. For example, on the FRAMES evaluation benchmark, ODS improves the best existing baseline of the recently released GPT-4o Search Preview by 9.7% in accuracy. ODS is a general framework for seamlessly augmenting any LLMs -- for example, DeepSeek-R1 that achieves 82.4% on SimpleQA and 30.1% on FRAMES -- with search and reasoning capabilities to achieve state-of-the-art performance: 88.3% on SimpleQA and 75.3% on FRAMES.