WebGLM: 人間の選好を考慮した効率的なWeb拡張型質問応答システムの実現に向けて

要旨

本論文では、General Language Model (GLM) に基づくウェブ拡張型質問応答システム「WebGLM」を提案します。その目的は、事前学習済みの大規模言語モデル (LLM) にウェブ検索と情報取得機能を追加しつつ、実世界での展開に適した効率性を実現することです。これを達成するため、WebGLM では LLM 拡張型検索器、ブートストラップ型生成器、および人間の選好を考慮したスコアラーを戦略的に開発しました。具体的には、WebGPT (OpenAI) の限界を特定し、それを克服することで、WebGLM は精度、効率性、コスト効率の面で優位性を持つことが可能となりました。さらに、ウェブ拡張型 QA システムを評価するための体系的な基準を提案します。多次元にわたる人間による評価と定量的なアブレーションスタディを行い、提案された WebGLM の設計が既存システムを上回ることを示しました。100億パラメータの GLM (10B) を搭載した WebGLM は、同規模の WebGPT (13B) を上回り、人間による評価では WebGPT (175B) にも匹敵する性能を示しています。コード、デモ、データは https://github.com/THUDM/WebGLM で公開されています。

English

We present WebGLM, a web-enhanced question-answering system based on the General Language Model (GLM). Its goal is to augment a pre-trained large language model (LLM) with web search and retrieval capabilities while being efficient for real-world deployments. To achieve this, we develop WebGLM with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. Specifically, we identify and address the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages. In addition, we propose systematic criteria for evaluating web-enhanced QA systems. We conduct multi-dimensional human evaluation and quantitative ablation studies, which suggest the outperformance of the proposed WebGLM designs over existing systems. WebGLM with the 10-billion-parameter GLM (10B) is shown to perform better than the similar-sized WebGPT (13B) and even comparably to WebGPT (175B) in human evaluation. The code, demo, and data are at https://github.com/THUDM/WebGLM.

WebGLM: 人間の選好を考慮した効率的なWeb拡張型質問応答システムの実現に向けて

WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences

要旨

Support