开源AI中的开放协作图谱：14个大型语言模型开源项目中的实践、动机与治理模式映射

摘要

开放大型语言模型（LLMs）的蓬勃发展正在人工智能（AI）领域催生一个充满活力的研究与创新生态系统。然而，关于开放LLMs在公开发布前后所采用的协作方式尚未得到全面研究，这限制了我们理解开放LLM项目如何启动、组织与治理，以及存在哪些机会进一步促进这一生态系统的发展。我们通过探索性分析，深入研究了开放LLMs开发与再利用生命周期中的开放协作，基于对来自北美、欧洲、非洲和亚洲的草根项目、研究机构、初创企业及大型科技公司的14个开放LLM开发者的半结构化访谈，填补了这一空白。我们为研究和实践做出了三项关键贡献。首先，开放LLM项目中的协作远不止于模型本身，还包括数据集、基准测试、开源框架、排行榜、知识共享与讨论论坛以及计算资源合作等。其次，开放LLM开发者拥有多样化的社会、经济和技术动机，从普及AI访问、推动开放科学到构建区域生态系统及扩展语言代表性。再次，所调查的开放LLM项目展现出五种不同的组织模式，从单一公司项目到非营利资助的草根项目不等，这些模式在控制集中度和社区参与策略上各有特色，贯穿于开放LLM的整个生命周期。最后，我们为致力于支持全球社区构建更加开放的AI未来的利益相关者提供了实用建议。

English

The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partnerships, among others. Second, open LLM developers have a variety of social, economic, and technological motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strategies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.

开源AI中的开放协作图谱：14个大型语言模型开源项目中的实践、动机与治理模式映射

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

摘要

Support