オープンソースAIにおけるオープンコラボレーションの地図作成：14のオープン大規模言語モデルプロジェクトにおける実践、動機、ガバナンスのマッピング

要旨

オープンな大規模言語モデル（LLM）の普及は、人工知能（AI）における研究とイノベーションの活発なエコシステムを促進しています。しかし、オープンLLMの開発において、その公開前後で用いられる協力方法についてはまだ包括的に研究されておらず、オープンLLMプロジェクトがどのように開始され、組織化され、運営されているか、またこのエコシステムをさらに促進するための機会が何であるかについての理解が限られています。私たちは、北米、ヨーロッパ、アフリカ、アジアの草の根プロジェクト、研究機関、スタートアップ、大手テクノロジー企業からなる14のオープンLLMの開発者に対する半構造化インタビューを基に、オープンLLMの開発と再利用のライフサイクル全体におけるオープンな協力を探索的に分析し、このギャップを埋めます。私たちは、研究と実践に対して3つの重要な貢献をします。第一に、オープンLLMプロジェクトにおける協力は、LLM自体をはるかに超えて、データセット、ベンチマーク、オープンソースフレームワーク、リーダーボード、知識共有とディスカッションフォーラム、コンピュートパートナーシップなどを含んでいます。第二に、オープンLLMの開発者には、AIアクセスの民主化やオープンサイエンスの促進から、地域エコシステムの構築や言語表現の拡大まで、さまざまな社会的、経済的、技術的動機があります。第三に、サンプルされたオープンLLMプロジェクトは、単一企業プロジェクトから非営利団体が支援する草の根プロジェクトまで、5つの異なる組織モデルを示しており、これらは制御の集中度やコミュニティエンゲージメント戦略において、オープンLLMライフサイクル全体で異なります。最後に、AIのよりオープンな未来を築くグローバルコミュニティを支援しようとするステークホルダーに向けた実践的な提言をまとめます。

English

The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partnerships, among others. Second, open LLM developers have a variety of social, economic, and technological motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strategies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.

オープンソースAIにおけるオープンコラボレーションの地図作成：14のオープン大規模言語モデルプロジェクトにおける実践、動機、ガバナンスのマッピング

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

要旨

Support