WebGLM: 인간 선호도를 반영한 효율적인 웹 기반 질의응답 시스템 구축

초록

우리는 일반 언어 모델(General Language Model, GLM)을 기반으로 한 웹 강화 질의응답 시스템인 WebGLM을 소개합니다. WebGLM의 목표는 사전 훈련된 대규모 언어 모델(Large Language Model, LLM)에 웹 검색 및 검색 기능을 추가하면서도 실제 환경 배포에 효율적이도록 하는 것입니다. 이를 위해 우리는 LLM 강화 검색기, 부트스트랩 생성기, 그리고 인간 선호도를 고려한 스코어러 전략을 통해 WebGLM을 개발했습니다. 특히, WebGPT(OpenAI)의 한계를 식별하고 이를 해결함으로써 WebGLM이 정확성, 효율성, 비용 효율성 측면에서 우위를 갖추도록 했습니다. 또한, 웹 강화 질의응답 시스템을 평가하기 위한 체계적인 기준을 제안합니다. 다차원적인 인간 평가와 정량적 제거 연구를 수행하여, 제안된 WebGLM 설계가 기존 시스템을 능가함을 보여줍니다. 100억 파라미터 GLM(10B)을 탑재한 WebGLM은 유사한 규모의 WebGPT(13B)보다 우수한 성능을 보이며, 인간 평가에서는 WebGPT(175B)와도 비슷한 수준의 성능을 나타냅니다. 코드, 데모 및 데이터는 https://github.com/THUDM/WebGLM에서 확인할 수 있습니다.

English

We present WebGLM, a web-enhanced question-answering system based on the General Language Model (GLM). Its goal is to augment a pre-trained large language model (LLM) with web search and retrieval capabilities while being efficient for real-world deployments. To achieve this, we develop WebGLM with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. Specifically, we identify and address the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages. In addition, we propose systematic criteria for evaluating web-enhanced QA systems. We conduct multi-dimensional human evaluation and quantitative ablation studies, which suggest the outperformance of the proposed WebGLM designs over existing systems. WebGLM with the 10-billion-parameter GLM (10B) is shown to perform better than the similar-sized WebGPT (13B) and even comparably to WebGPT (175B) in human evaluation. The code, demo, and data are at https://github.com/THUDM/WebGLM.

WebGLM: 인간 선호도를 반영한 효율적인 웹 기반 질의응답 시스템 구축

WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences

초록

Support