Thanos: スキル・オブ・マインドを融合した大規模言語モデルによる会話エージェントの強化

要旨

対話者との社会的絆を強化するため、人間は自然と、どの会話スキルがその状況に最も適しているかを考慮し、適切に応答する能力を獲得します。このプロセスを我々は「スキル・オブ・マインド」と呼びます。大規模言語モデル（LLM）ベースの対話エージェントにとって、人間のように適切な会話スキルを計画することは、特にインタラクティブなシナリオにおける社会対話の複雑さから、困難です。これを解決するため、我々は「多面的スキル・オブ・マインド」という、多面的かつ多ターンの会話スキルを様々なインタラクティブシナリオ（例：長期的、カウンセリング、タスク指向）に基づいて注釈付きで収録した会話データセットを提案します。このデータセットは、多様な社会的文脈（例：人口統計、ペルソナ、経験則）に基づいており、約10万の会話で構成されています。このデータセットを用いて、我々は1B、3B、8Bパラメータのモデルサイズを持つ「Thanos」という新たなスキル・オブ・マインドを組み込んだLLMファミリーを導入します。広範な実験を通じて、これらのモデルはスキル・オブ・マインドのプロセスを成功裏に示し、様々な領域にわたる多面的スキルの推論において強い汎化性能を発揮します。さらに、ThanosはLLMベースの対話エージェントが生成する応答の品質を大幅に向上させ、人間評価においてもプロソーシャル行動を促進することを示します。

English

To increase social bonding with interlocutors, humans naturally acquire the ability to respond appropriately in a given situation by considering which conversational skill is most suitable for the response - a process we call skill-of-mind. For large language model (LLM)-based conversational agents, planning appropriate conversational skills, as humans do, is challenging due to the complexity of social dialogue, especially in interactive scenarios. To address this, we propose a skill-of-mind-annotated conversation dataset, named Multifaceted Skill-of-Mind, which includes multi-turn and multifaceted conversational skills across various interactive scenarios (e.g., long-term, counseling, task-oriented), grounded in diverse social contexts (e.g., demographics, persona, rules of thumb). This dataset consists of roughly 100K conversations. Using this dataset, we introduce a new family of skill-of-mind-infused LLMs, named Thanos, with model sizes of 1B, 3B, and 8B parameters. With extensive experiments, these models successfully demonstrate the skill-of-mind process and exhibit strong generalizability in inferring multifaceted skills across a variety of domains. Moreover, we show that Thanos significantly enhances the quality of responses generated by LLM-based conversational agents and promotes prosocial behavior in human evaluations.