社会的整合フレームワークはLLMの整合性を向上させることができる

要旨

大規模言語モデル（LLM）の最近の進展は、人間の期待に応え、共有される価値観に沿った応答を生成することに焦点を当てています。このプロセスは「アラインメント」と呼ばれています。しかし、人間の価値観の複雑さと、それに対処するために設計された技術的アプローチの狭さとの間に本質的な隔たりがあるため、LLMのアラインメントは依然として困難です。現在のアラインメント手法は、しばしば誤った目的設定を引き起こし、これは不完全な契約というより広範な問題を反映しています。つまり、モデル開発者とモデルの間で、LLMのアラインメントにおけるあらゆるシナリオを考慮した契約を指定することは非現実的です。本論文では、LLMのアラインメントを改善するためには、社会的、経済的、契約的アラインメントを含む社会的アラインメントフレームワークからの洞察を取り入れる必要があると主張し、これらの領域から得られる潜在的な解決策について議論します。社会的アラインメントフレームワーク内での不確実性の役割を考慮し、それがLLMのアラインメントにどのように現れるかを調査します。最後に、LLMのアラインメントの目的が未指定であることを、その仕様を完璧にするのではなく、機会として捉える代替的な視点を提供します。LLMのアラインメントにおける技術的改善を超えて、参加型アラインメントインターフェース設計の必要性についても議論します。

English

Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.

社会的整合フレームワークはLLMの整合性を向上させることができる

Societal Alignment Frameworks Can Improve LLM Alignment

要旨

Support