ヘパイストス：大規模言語モデルの基本的なエージェント能力を向上させるための継続的な事前学習

要旨

エージェント志向の事前トレーニングデータが不足しているため、LLMベースの自律エージェントは通常、新しい能力を導入しつつ強力な汎化性を維持することが難しい複雑なプロンプトや広範なファインチューニングに頼ることがよくあります。本研究では、API関数呼び出し、内在的推論と計画、環境フィードバックへの適応の基本的な能力を向上させるために設計された初の大規模事前トレーニングコーパスであるHephaestus-Forgeを紹介します。Hephaestus-Forgeには、76,537のAPIを含む103Bのエージェント固有データが含まれており、API関数の知識を導入するためのツールのドキュメントと内在的推論を強化するための関数呼び出し軌道が含まれています。効果的なトレーニングプロトコルを探るために、データ混合比率の最適なレシピを特定するためにスケーリング則を調査します。Hephaestus-Forgeでの継続的な事前トレーニングにより、Hephaestusは、3つのエージェントベンチマークで小規模から中規模のオープンソースLLMを上回り、商用LLMと競り合う性能を発揮し、LLMの基本的なエージェント能力と新しいタスクや環境への汎化を向上させる当社の事前トレーニングコーパスの効果を示しています。

English

Due to the scarcity of agent-oriented pre-training data, LLM-based autonomous agents typically rely on complex prompting or extensive fine-tuning, which often fails to introduce new capabilities while preserving strong generalizability. We introduce Hephaestus-Forge, the first large-scale pre-training corpus designed to enhance the fundamental capabilities of LLM agents in API function calling, intrinsic reasoning and planning, and adapting to environmental feedback. Hephaestus-Forge comprises 103B agent-specific data encompassing 76,537 APIs, including both tool documentation to introduce knowledge of API functions and function calling trajectories to strengthen intrinsic reasoning. To explore effective training protocols, we investigate scaling laws to identify the optimal recipe in data mixing ratios. By continual pre-training on Hephaestus-Forge, Hephaestus outperforms small- to medium-scale open-source LLMs and rivals commercial LLMs on three agent benchmarks, demonstrating the effectiveness of our pre-training corpus in enhancing fundamental agentic capabilities and generalization of LLMs to new tasks or environments.

ヘパイストス：大規模言語モデルの基本的なエージェント能力を向上させるための継続的な事前学習

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

要旨

Support