OctoTools：複雑な推論のための拡張可能なツールを備えたエージェント型フレームワーク

要旨

複雑な推論タスクを解決するには、視覚的理解、ドメイン知識の検索、数値計算、多段階の推論が関与する場合があります。既存の手法では、大規模言語モデル（LLM）を外部ツールで拡張していますが、特定の専門領域に限定されていたり、ツールの種類が限られていたり、追加のトレーニングデータを必要とする場合があります。本論文では、OctoToolsを紹介します。これは、トレーニング不要でユーザーフレンドリー、かつ容易に拡張可能なオープンソースのエージェントフレームワークであり、多様なドメインにわたる複雑な推論に取り組むように設計されています。OctoToolsは、ツールの機能をカプセル化する標準化されたツールカード、高レベルおよび低レベルの計画を行うプランナー、ツールの使用を実行するエグゼキュータを導入しています。私たちは、OctoToolsの汎用性を16の多様なタスク（MathVista、MMLU-Pro、MedQA、GAIA-Textを含む）で検証し、GPT-4oに対して平均9.3%の精度向上を達成しました。さらに、OctoToolsは、同じツールセットが与えられた場合、AutoGen、GPT-Functions、LangChainを最大10.6%上回りました。包括的な分析とアブレーションを通じて、OctoToolsはタスク計画、効果的なツール使用、多段階の問題解決において優位性を示しています。

English

Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools. Through comprehensive analysis and ablations, OctoTools demonstrates advantages in task planning, effective tool usage, and multi-step problem solving.

OctoTools：複雑な推論のための拡張可能なツールを備えたエージェント型フレームワーク

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

要旨

Support