強化大型語言模型在工業領域特定問答任務上的表現
Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering
May 19, 2023
作者: Zezhong Wang, Fangkai Yang, Pu Zhao, Lu Wang, Jue Zhang, Mohit Garg, Qingwei Lin, Dongmei Zhang
cs.AI
摘要
大型語言模型(LLM)在開放領域任務中廣受歡迎並取得顯著成就,但在真實產業特定場景中的表現平均,因為它缺乏特定知識。這個問題引起了廣泛關注,但相關基準測試很少。本文提供了一個名為 MSQA 的基準問答(QA)數據集,涉及微軟產品和客戶遇到的 IT 技術問題。這個數據集包含行業特定的雲端 QA 知識,對於一般 LLM 不可用,因此非常適合評估旨在提升 LLM 領域特定能力的方法。此外,我們提出了一種新的模型交互範式,可以賦予 LLM 在其不擅長的領域特定任務上取得更好的表現。廣泛的實驗表明,遵循我們的模型融合框架的方法優於常用的帶檢索方法的 LLM。
English
Large Language Model (LLM) has gained popularity and achieved remarkable
results in open-domain tasks, but its performance in real industrial
domain-specific scenarios is average since there is no specific knowledge in
it. This issue has attracted widespread attention, but there are few relevant
benchmarks available. In this paper, we provide a benchmark Question Answering
(QA) dataset named MSQA, which is about Microsoft products and IT technical
problems encountered by customers. This dataset contains industry
cloud-specific QA knowledge, which is not available for general LLM, so it is
well suited for evaluating methods aimed at improving domain-specific
capabilities of LLM. In addition, we propose a new model interaction paradigm
that can empower LLM to achieve better performance on domain-specific tasks
where it is not proficient. Extensive experiments demonstrate that the approach
following our model fusion framework outperforms the commonly used LLM with
retrieval methods.