TrustLLM: 대규모 언어 모델의 신뢰성

초록

ChatGPT와 같은 대형 언어 모델(LLM)은 뛰어난 자연어 처리 능력으로 인해 상당한 주목을 받고 있습니다. 그러나 이러한 LLM은 신뢰성 측면에서 많은 도전 과제를 제시합니다. 따라서 LLM의 신뢰성을 보장하는 것은 중요한 주제로 부각되고 있습니다. 본 논문은 TrustLLM을 소개하며, 이는 LLM의 신뢰성에 대한 포괄적인 연구로, 다양한 신뢰성 차원에 대한 원칙, 기존 벤치마크, 주요 LLM의 신뢰성 평가 및 분석, 그리고 미해결 과제와 미래 방향에 대한 논의를 포함합니다. 구체적으로, 우리는 먼저 8가지 차원에 걸친 신뢰할 수 있는 LLM을 위한 원칙 세트를 제안합니다. 이러한 원칙을 바탕으로, 우리는 진실성, 안전성, 공정성, 견고성, 프라이버시, 기계 윤리를 포함한 6가지 차원에 걸친 벤치마크를 구축합니다. 그런 다음, 30개 이상의 데이터셋으로 구성된 TrustLLM에서 16개의 주요 LLM을 평가한 연구를 제시합니다. 우리의 연구 결과는 첫째, 일반적으로 신뢰성과 유용성(즉, 기능적 효과성)은 양의 상관관계가 있음을 보여줍니다. 둘째, 상용 LLM이 대부분의 오픈소스 대비 신뢰성 측면에서 우수한 성능을 보이며, 이는 널리 접근 가능한 오픈소스 LLM의 잠재적 위험에 대한 우려를 제기합니다. 그러나 일부 오픈소스 LLM은 상용 LLM에 매우 근접한 성능을 보입니다. 셋째, 일부 LLM은 지나치게 신뢰성을 보이도록 조정되어, 유해하지 않은 프롬프트를 유해한 것으로 오인하여 응답하지 않음으로써 유용성을 저해할 수 있음에 주목해야 합니다. 마지막으로, 모델 자체뿐만 아니라 신뢰성을 뒷받침하는 기술에서도 투명성을 보장하는 것의 중요성을 강조합니다. 어떤 신뢰할 수 있는 기술이 사용되었는지 아는 것은 그 효과를 분석하는 데 있어 매우 중요합니다.

English

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

TrustLLM: 대규모 언어 모델의 신뢰성

TrustLLM: Trustworthiness in Large Language Models

초록

Support