オープン・ディープ・サーチ：オープンソースの推論エージェントによる検索の民主化

要旨

我々は、PerplexityのSonar Reasoning ProやOpenAIのGPT-4o Search Previewといった独自の検索AIソリューションと、それらのオープンソース版との間で広がりつつあるギャップを埋めるため、Open Deep Search（ODS）を導入する。ODSで導入された主な革新点は、最新のオープンソース大規模言語モデル（LLM）の推論能力を、クエリに答えるためにウェブ検索ツールを適切に使用できる推論エージェントで拡張することである。具体的には、ODSはユーザーが選択したベースLLMと連携する2つのコンポーネントで構成される：Open Search ToolとOpen Reasoning Agentである。Open Reasoning Agentは与えられたタスクを解釈し、ツールの呼び出しを含む一連のアクションを調整してタスクを完了する。そのツールの1つがOpen Search Toolである。Open Search Toolは、独自の検索ツールを上回る新しいウェブ検索ツールである。DeepSeek-R1のような強力なオープンソース推論LLMと組み合わせることで、ODSは2つのベンチマーク（SimpleQAとFRAMES）において、既存の最先端ベースラインにほぼ並び、時にはそれを上回る性能を発揮する。例えば、FRAMES評価ベンチマークでは、ODSは最近リリースされたGPT-4o Search Previewの既存の最高ベースラインを精度で9.7%向上させる。ODSは、任意のLLM（例えば、SimpleQAで82.4%、FRAMESで30.1%を達成するDeepSeek-R1）をシームレスに拡張し、検索と推論能力を追加して最先端の性能（SimpleQAで88.3%、FRAMESで75.3%）を実現するための汎用フレームワークである。

English

We introduce Open Deep Search (ODS) to close the increasing gap between the proprietary search AI solutions, such as Perplexity's Sonar Reasoning Pro and OpenAI's GPT-4o Search Preview, and their open-source counterparts. The main innovation introduced in ODS is to augment the reasoning capabilities of the latest open-source LLMs with reasoning agents that can judiciously use web search tools to answer queries. Concretely, ODS consists of two components that work with a base LLM chosen by the user: Open Search Tool and Open Reasoning Agent. Open Reasoning Agent interprets the given task and completes it by orchestrating a sequence of actions that includes calling tools, one of which is the Open Search Tool. Open Search Tool is a novel web search tool that outperforms proprietary counterparts. Together with powerful open-source reasoning LLMs, such as DeepSeek-R1, ODS nearly matches and sometimes surpasses the existing state-of-the-art baselines on two benchmarks: SimpleQA and FRAMES. For example, on the FRAMES evaluation benchmark, ODS improves the best existing baseline of the recently released GPT-4o Search Preview by 9.7% in accuracy. ODS is a general framework for seamlessly augmenting any LLMs -- for example, DeepSeek-R1 that achieves 82.4% on SimpleQA and 30.1% on FRAMES -- with search and reasoning capabilities to achieve state-of-the-art performance: 88.3% on SimpleQA and 75.3% on FRAMES.

オープン・ディープ・サーチ：オープンソースの推論エージェントによる検索の民主化

Open Deep Search: Democratizing Search with Open-source Reasoning Agents

要旨

Support