h2oGPT：民主化大型語言模型

摘要

基礎大型語言模型（LLMs）如 GPT-4 代表了 AI 領域的一場革命，透過自然語言處理實現了在現實世界中的應用。然而，它們也帶來許多重大風險，例如存在偏見、私人或有害文本，以及未經授權包含受版權保護的材料。我們介紹 h2oGPT，這是一套基於生成式預訓練轉換器（GPTs）的大型語言模型（LLMs）創建和使用的開源程式碼存儲庫。該項目的目標是創建世界上最好的真正開源替代方案，以取代封閉式的 GPTs。通過與令人難以置信且不可阻擋的開源社區合作，我們開源了幾個從 70 到 400 億參數的精調 h2oGPT 模型，可在完全寬鬆的 Apache 2.0 許可證下商業使用。我們的發布中還包括使用自然語言進行 100% 私人文件搜索。開源語言模型有助於推動 AI 發展，使其更具可及性和可信賴性。它們降低了進入門檻，讓個人和團體能夠根據自己的需求來定制這些模型。這種開放性增加了創新、透明度和公平性。開源策略對於公平分享 AI 好處至關重要，H2O.ai 將繼續實現 AI 和 LLMs 的民主化。

English

Foundation Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their real-world applications though natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material. We introduce h2oGPT, a suite of open-source code repositories for the creation and use of Large Language Models (LLMs) based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source GPTs. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100% private document search using natural language. Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness. An open-source strategy is needed to share AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.

h2oGPT：民主化大型語言模型

h2oGPT: Democratizing Large Language Models

摘要

Support