大型语言模型诚实性调查
A Survey on the Honesty of Large Language Models
September 27, 2024
作者: Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, Jie Zhou, Yujiu Yang, Ngai Wong, Xixin Wu, Wai Lam
cs.AI
摘要
诚实是与人类价值观相一致的大型语言模型(LLMs)的基本原则,要求这些模型能够识别自己所知道和不知道的内容,并能够忠实地表达其知识。尽管有所希望,但当前的LLMs仍然表现出明显的不诚实行为,例如自信地呈现错误答案或未能表达其所知道的内容。此外,关于LLMs诚实性的研究也面临挑战,包括对诚实的不同定义、区分已知和未知知识的困难,以及对相关研究缺乏全面的理解。为了解决这些问题,我们提供了关于LLMs诚实性的调查,涵盖了其澄清、评估方法和改进策略。此外,我们为未来研究提供了见解,旨在激发对这一重要领域的进一步探索。
English
Honesty is a fundamental principle for aligning large language models (LLMs)
with human values, requiring these models to recognize what they know and don't
know and be able to faithfully express their knowledge. Despite promising,
current LLMs still exhibit significant dishonest behaviors, such as confidently
presenting wrong answers or failing to express what they know. In addition,
research on the honesty of LLMs also faces challenges, including varying
definitions of honesty, difficulties in distinguishing between known and
unknown knowledge, and a lack of comprehensive understanding of related
research. To address these issues, we provide a survey on the honesty of LLMs,
covering its clarification, evaluation approaches, and strategies for
improvement. Moreover, we offer insights for future research, aiming to inspire
further exploration in this important area.Summary
AI-Generated Summary