大規模なアクションモデル：発端から実装まで

要旨

AIの進歩が続く中、言語に基づく支援を超え、現実世界での行動を実行できる知的エージェントに進化するシステムへの需要が高まっています。この進化には、テキスト応答の生成に優れる従来の大規模言語モデル（LLMs）から、動的環境内での行動生成と実行を目的とした大規模行動モデル（LAMs）への移行が必要です。エージェントシステムによって可能にされるLAMsは、AIを受動的な言語理解から能動的なタスク完了へと変革し、人工一般知能に向けた進展において重要なマイルストーンを示しています。本論文では、LAMsの開発のための包括的なフレームワークを提案し、その創造から展開までの体系的なアプローチを提供します。LAMsの概要から始め、その特徴を強調し、LLMsとの違いを明確にします。Windows OSベースのエージェントをケーススタディとして使用し、データ収集、モデルトレーニング、環境統合、グラウンディング、評価など、LAM開発の主要段階について詳細なステップバイステップガイドを提供します。この一般化可能なワークフローは、さまざまなアプリケーション領域で機能的なLAMsを作成するための設計図として役立ちます。最後に、LAMsの現在の制限事項を特定し、将来の研究および産業展開の方向を議論し、現実世界のアプリケーションでLAMsの完全な潜在能力を実現するために前進する際に直面する課題と機会を強調します。本論文で使用されたデータ収集プロセスのコードは、以下のリンクから公開されています：https://github.com/microsoft/UFO/tree/main/dataflow、詳細なドキュメントはhttps://microsoft.github.io/UFO/dataflow/overview/で入手できます。

English

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence. In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications. The code for the data collection process utilized in this paper is publicly available at: https://github.com/microsoft/UFO/tree/main/dataflow, and comprehensive documentation can be found at https://microsoft.github.io/UFO/dataflow/overview/.

大規模なアクションモデル：発端から実装まで

Large Action Models: From Inception to Implementation

要旨

Support