开源人工智能中的开放协作图景:十四项大型语言模型开源项目的实践、动机与治理映射
A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects
September 29, 2025
作者: Johan Linåker, Cailean Osborne, Jennifer Ding, Ben Burtenshaw
cs.AI
摘要
開放大型語言模型(LLMs)的蓬勃發展正促進人工智慧(AI)研究與創新的活躍生態系統。然而,無論是在公開發布前後,用於開發開放LLMs的合作方式尚未得到全面研究,這限制了我們對開放LLM項目如何啟動、組織和治理的理解,以及進一步培育這一生態系統的機會。我們通過對開放LLMs開發和再利用生命週期中的合作進行探索性分析來彌補這一空白,並基於對來自北美、歐洲、非洲和亞洲的基層項目、研究機構、初創企業和大型科技公司的14個開放LLMs開發者的半結構化訪談。我們為研究和實踐做出了三項關鍵貢獻。首先,開放LLM項目中的合作遠遠超出了LLMs本身,涵蓋了數據集、基準測試、開源框架、排行榜、知識共享與討論論壇以及計算合作夥伴等。其次,開放LLM開發者具有多種社會、經濟和技術動機,從民主化AI訪問和促進開放科學到建立區域生態系統和擴展語言代表性。第三,所採樣的開放LLM項目展示了五種不同的組織模式,從單一公司項目到非營利組織支持的基層項目,這些模式在控制集中度和社區參與策略上有所不同,並貫穿於開放LLM生命週期的各個階段。最後,我們為尋求支持全球社區構建更開放AI未來的利益相關者提供了實用建議。
English
The proliferation of open large language models (LLMs) is fostering a vibrant
ecosystem of research and innovation in artificial intelligence (AI). However,
the methods of collaboration used to develop open LLMs both before and after
their public release have not yet been comprehensively studied, limiting our
understanding of how open LLM projects are initiated, organized, and governed
as well as what opportunities there are to foster this ecosystem even further.
We address this gap through an exploratory analysis of open collaboration
throughout the development and reuse lifecycle of open LLMs, drawing on
semi-structured interviews with the developers of 14 open LLMs from grassroots
projects, research institutes, startups, and Big Tech companies in North
America, Europe, Africa, and Asia. We make three key contributions to research
and practice. First, collaboration in open LLM projects extends far beyond the
LLMs themselves, encompassing datasets, benchmarks, open source frameworks,
leaderboards, knowledge sharing and discussion forums, and compute
partnerships, among others. Second, open LLM developers have a variety of
social, economic, and technological motivations, from democratizing AI access
and promoting open science to building regional ecosystems and expanding
language representation. Third, the sampled open LLM projects exhibit five
distinct organizational models, ranging from single company projects to
non-profit-sponsored grassroots projects, which vary in their centralization of
control and community engagement strategies used throughout the open LLM
lifecycle. We conclude with practical recommendations for stakeholders seeking
to support the global community building a more open future for AI.