h2oGPT: 대규모 언어 모델의 민주화

초록

GPT-4와 같은 기초 대형 언어 모델(LLMs)은 자연어 처리를 통한 실생활 응용으로 인해 AI 분야에서 혁명을 일으켰습니다. 그러나 이러한 모델들은 편향적이거나 사적인, 유해한 텍스트의 존재, 그리고 저작권이 있는 자료의 무단 포함과 같은 상당한 위험 요소도 내포하고 있습니다. 우리는 생성형 사전 훈련 변환기(GPTs)를 기반으로 한 대형 언어 모델(LLMs)의 생성 및 사용을 위한 오픈소스 코드 저장소 모음인 h2oGPT를 소개합니다. 이 프로젝트의 목표는 폐쇄형 GPTs에 대한 세계 최고의 진정한 오픈소스 대안을 만드는 것입니다. 놀랍고도 멈출 수 없는 오픈소스 커뮤니티와의 협력 속에서, 우리는 상업적 사용이 가능한 완전히 허가된 Apache 2.0 라이선스 하에 70억에서 400억 파라미터에 이르는 여러 세부 조정된 h2oGPT 모델을 오픈소스로 공개합니다. 우리의 릴리스에는 자연어를 사용한 100% 사적인 문서 검색 기능도 포함되어 있습니다. 오픈소스 언어 모델은 AI 개발을 촉진하고 이를 더 접근 가능하며 신뢰할 수 있게 만드는 데 도움을 줍니다. 이들은 진입 장벽을 낮춰 개인과 단체가 이러한 모델을 자신들의 필요에 맞게 조정할 수 있게 합니다. 이러한 개방성은 혁신, 투명성, 그리고 공정성을 증가시킵니다. AI의 혜택을 공정하게 공유하기 위해서는 오픈소스 전략이 필요하며, H2O.ai는 AI와 LLMs의 민주화를 계속해서 추진할 것입니다.

English

Foundation Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their real-world applications though natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material. We introduce h2oGPT, a suite of open-source code repositories for the creation and use of Large Language Models (LLMs) based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source GPTs. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100% private document search using natural language. Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness. An open-source strategy is needed to share AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.

h2oGPT: 대규모 언어 모델의 민주화

h2oGPT: Democratizing Large Language Models

초록

Support