大規模言語モデルの誠実さに関する調査

要旨

正直さは、大規模言語モデル（LLM）を人間の価値観と整合させるための基本原則であり、これらのモデルには自分が何を知っていて何を知らないかを認識し、その知識を忠実に表現する能力が求められます。有望なものの、現在のLLMは依然として確信を持って間違った回答を提示したり、自分が知っていることを表現できないなど、著しい不正直な行動を示すことがあります。さらに、LLMの正直さに関する研究は、正直さの定義の違い、既知と未知の知識の区別の難しさ、関連研究の包括的な理解の欠如など、さまざまな課題に直面しています。これらの問題に対処するために、私たちはLLMの正直さに関する調査を提供し、その明確化、評価アプローチ、および改善戦略についてカバーします。さらに、この重要な分野でのさらなる探求を促すことを目指して、将来の研究に向けた示唆を提供します。

English

Honesty is a fundamental principle for aligning large language models (LLMs) with human values, requiring these models to recognize what they know and don't know and be able to faithfully express their knowledge. Despite promising, current LLMs still exhibit significant dishonest behaviors, such as confidently presenting wrong answers or failing to express what they know. In addition, research on the honesty of LLMs also faces challenges, including varying definitions of honesty, difficulties in distinguishing between known and unknown knowledge, and a lack of comprehensive understanding of related research. To address these issues, we provide a survey on the honesty of LLMs, covering its clarification, evaluation approaches, and strategies for improvement. Moreover, we offer insights for future research, aiming to inspire further exploration in this important area.

大規模言語モデルの誠実さに関する調査

A Survey on the Honesty of Large Language Models

要旨

Support