h2oGPT: 大規模言語モデルの民主化

要旨

GPT-4のような基盤大規模言語モデル（LLM）は、自然言語処理を通じた実世界での応用により、AIにおける革命を象徴しています。しかし、これらのモデルは、偏見のある、プライベートな、または有害なテキストの存在や、著作権で保護された素材の無許可使用など、多くの重大なリスクも抱えています。私たちは、Generative Pretrained Transformers（GPT）に基づく大規模言語モデル（LLM）の作成と使用のためのオープンソースコードリポジトリスイートであるh2oGPTを紹介します。このプロジェクトの目標は、クローズドソースのGPTに対する世界最高の真のオープンソース代替品を作成することです。驚異的で止められないオープンソースコミュニティとの協力の一環として、商用利用可能な7Bから40Bパラメータまでの複数のファインチューニング済みh2oGPTモデルを、完全に許諾的なApache 2.0ライセンスの下でオープンソース化します。私たちのリリースには、自然言語を使用した100%プライベートなドキュメント検索も含まれています。オープンソースの言語モデルは、AI開発を促進し、よりアクセスしやすく信頼できるものにするのに役立ちます。これらは参入障壁を下げ、個人やグループがこれらのモデルを自分のニーズに合わせてカスタマイズすることを可能にします。この開放性は、イノベーション、透明性、公平性を高めます。AIの恩恵を公平に共有するためにはオープンソース戦略が必要であり、H2O.aiはAIとLLMの民主化を続けていきます。

English

Foundation Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their real-world applications though natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material. We introduce h2oGPT, a suite of open-source code repositories for the creation and use of Large Language Models (LLMs) based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source GPTs. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100% private document search using natural language. Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness. An open-source strategy is needed to share AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.

h2oGPT: 大規模言語モデルの民主化

h2oGPT: Democratizing Large Language Models

要旨

Support