TrustLLM: 大規模言語モデルにおける信頼性

要旨

ChatGPTに代表される大規模言語モデル（LLM）は、その優れた自然言語処理能力により大きな注目を集めています。しかしながら、これらのLLMは信頼性の分野において多くの課題を抱えています。そのため、LLMの信頼性を確保することが重要なテーマとして浮上しています。本論文では、TrustLLMを紹介します。これは、LLMの信頼性に関する包括的な研究であり、信頼性の異なる次元における原則、確立されたベンチマーク、主流のLLMの信頼性の評価と分析、そして未解決の課題と将来の方向性についての議論を含んでいます。具体的には、まず、信頼性のあるLLMのための8つの異なる次元にわたる原則を提案します。これらの原則に基づいて、真実性、安全性、公平性、堅牢性、プライバシー、機械倫理を含む6つの次元にわたるベンチマークを確立します。次に、TrustLLMにおいて16の主流LLMを評価する研究を提示し、30以上のデータセットを用いて分析を行います。我々の調査結果は、まず、一般的に信頼性と有用性（すなわち、機能的な有効性）が正の相関関係にあることを示しています。第二に、プロプライエタリなLLMは、信頼性の面でほとんどのオープンソースのLLMを上回っており、広くアクセス可能なオープンソースLLMの潜在的なリスクについて懸念を提起しています。しかし、いくつかのオープンソースLLMはプロプライエタリなものに非常に近い性能を示しています。第三に、一部のLLMは信頼性を示すために過剰に調整されている可能性があり、良性のプロンプトを有害と誤解して応答しないことで、有用性を損なっていることに注意が必要です。最後に、モデル自体だけでなく、信頼性を支える技術においても透明性を確保することの重要性を強調します。どのような信頼性技術が採用されているかを知ることは、その有効性を分析する上で重要です。

English

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

TrustLLM: 大規模言語モデルにおける信頼性

TrustLLM: Trustworthiness in Large Language Models

要旨

Support