強化大型語言模型在工業領域特定問答任務上的表現

摘要

大型語言模型（LLM）在開放領域任務中廣受歡迎並取得顯著成就，但在真實產業特定場景中的表現平均，因為它缺乏特定知識。這個問題引起了廣泛關注，但相關基準測試很少。本文提供了一個名為 MSQA 的基準問答（QA）數據集，涉及微軟產品和客戶遇到的 IT 技術問題。這個數據集包含行業特定的雲端 QA 知識，對於一般 LLM 不可用，因此非常適合評估旨在提升 LLM 領域特定能力的方法。此外，我們提出了一種新的模型交互範式，可以賦予 LLM 在其不擅長的領域特定任務上取得更好的表現。廣泛的實驗表明，遵循我們的模型融合框架的方法優於常用的帶檢索方法的 LLM。

English

Large Language Model (LLM) has gained popularity and achieved remarkable results in open-domain tasks, but its performance in real industrial domain-specific scenarios is average since there is no specific knowledge in it. This issue has attracted widespread attention, but there are few relevant benchmarks available. In this paper, we provide a benchmark Question Answering (QA) dataset named MSQA, which is about Microsoft products and IT technical problems encountered by customers. This dataset contains industry cloud-specific QA knowledge, which is not available for general LLM, so it is well suited for evaluating methods aimed at improving domain-specific capabilities of LLM. In addition, we propose a new model interaction paradigm that can empower LLM to achieve better performance on domain-specific tasks where it is not proficient. Extensive experiments demonstrate that the approach following our model fusion framework outperforms the commonly used LLM with retrieval methods.

強化大型語言模型在工業領域特定問答任務上的表現

Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering

摘要

Support