오픈 소스 AI에서의 개방형 협업 지도: 14개의 오픈 대형 언어 모델 프로젝트의 실천, 동기 및 거버넌스 매핑

초록

오픈 대형 언어 모델(LLM)의 확산은 인공지능(AI) 분야에서 활발한 연구와 혁신의 생태계를 조성하고 있습니다. 그러나 오픈 LLM의 개발 과정에서 공개 전후로 사용된 협업 방식에 대한 포괄적인 연구가 아직 이루어지지 않아, 오픈 LLM 프로젝트가 어떻게 시작되고 조직되며 관리되는지, 그리고 이 생태계를 더욱 발전시킬 수 있는 기회가 무엇인지에 대한 이해가 제한적입니다. 우리는 북미, 유럽, 아프리카, 아시아의 지역 기반 프로젝트, 연구 기관, 스타트업, 대형 기술 기업에서 개발된 14개의 오픈 LLM 개발자들과의 반구조화된 인터뷰를 바탕으로, 오픈 LLM의 개발과 재사용 생명주기 전반에 걸친 오픈 협업에 대한 탐색적 분석을 통해 이 격차를 해소하고자 합니다. 우리는 연구와 실무에 세 가지 주요 기여를 합니다. 첫째, 오픈 LLM 프로젝트에서의 협업은 LLM 자체를 넘어 데이터셋, 벤치마크, 오픈소스 프레임워크, 리더보드, 지식 공유 및 토론 포럼, 컴퓨팅 파트너십 등 다양한 요소를 포함합니다. 둘째, 오픈 LLM 개발자들은 AI 접근의 민주화와 오픈 과학의 촉진부터 지역 생태계 구축 및 언어 표현 확장에 이르기까지 다양한 사회적, 경제적, 기술적 동기를 가지고 있습니다. 셋째, 샘플링된 오픈 LLM 프로젝트는 단일 기업 프로젝트부터 비영리 단체가 후원하는 지역 기반 프로젝트에 이르기까지 다섯 가지 독특한 조직 모델을 보여주며, 이는 오픈 LLM 생명주기 전반에 걸친 통제의 중앙집중화와 커뮤니티 참여 전략에서 차이를 보입니다. 우리는 AI의 더 개방적인 미래를 위해 글로벌 커뮤니티를 지원하려는 이해관계자들을 위한 실질적인 권고사항으로 결론을 맺습니다.

English

The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partnerships, among others. Second, open LLM developers have a variety of social, economic, and technological motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strategies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.

오픈 소스 AI에서의 개방형 협업 지도: 14개의 오픈 대형 언어 모델 프로젝트의 실천, 동기 및 거버넌스 매핑

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

초록

Support