타노스: 스킬 오브 마인드가 융합된 대규모 언어 모델로 대화형 에이전트 강화하기

초록

대화 상대와의 사회적 유대감을 높이기 위해 인간은 주어진 상황에서 어떤 대화 기술이 응답에 가장 적합한지를 고려하여 적절히 반응하는 능력을 자연스럽게 습득합니다. 이러한 과정을 우리는 '마음의 기술(skill-of-mind)'이라고 부릅니다. 대형 언어 모델(LLM) 기반 대화 에이전트의 경우, 인간과 같이 적절한 대화 기술을 계획하는 것은 사회적 대화의 복잡성, 특히 상호작용 시나리오에서 어려운 과제입니다. 이를 해결하기 위해, 우리는 다양한 상호작용 시나리오(예: 장기적, 상담, 작업 지향적)와 다양한 사회적 맥락(예: 인구통계학적 특성, 페르소나, 경험적 규칙)에 기반한 다면적 대화 기술을 포함한 '다면적 마음의 기술(Multifaceted Skill-of-mind)'이라는 주석이 달린 대화 데이터셋을 제안합니다. 이 데이터셋은 약 10만 건의 대화로 구성되어 있습니다. 이 데이터셋을 활용하여, 우리는 1B, 3B, 8B 매개변수 크기의 '타노스(Thanos)'라는 새로운 마음의 기술이 반영된 LLM 모델군을 소개합니다. 광범위한 실험을 통해, 이 모델들은 마음의 기술 과정을 성공적으로 보여주며 다양한 도메인에서 다면적 기술을 추론하는 데 있어 강력한 일반화 능력을 보여줍니다. 또한, 타노스는 LLM 기반 대화 에이전트가 생성하는 응답의 질을 크게 향상시키고, 인간 평가에서 친사회적 행동을 촉진하는 것으로 나타났습니다.

English

To increase social bonding with interlocutors, humans naturally acquire the ability to respond appropriately in a given situation by considering which conversational skill is most suitable for the response - a process we call skill-of-mind. For large language model (LLM)-based conversational agents, planning appropriate conversational skills, as humans do, is challenging due to the complexity of social dialogue, especially in interactive scenarios. To address this, we propose a skill-of-mind-annotated conversation dataset, named Multifaceted Skill-of-Mind, which includes multi-turn and multifaceted conversational skills across various interactive scenarios (e.g., long-term, counseling, task-oriented), grounded in diverse social contexts (e.g., demographics, persona, rules of thumb). This dataset consists of roughly 100K conversations. Using this dataset, we introduce a new family of skill-of-mind-infused LLMs, named Thanos, with model sizes of 1B, 3B, and 8B parameters. With extensive experiments, these models successfully demonstrate the skill-of-mind process and exhibit strong generalizability in inferring multifaceted skills across a variety of domains. Moreover, we show that Thanos significantly enhances the quality of responses generated by LLM-based conversational agents and promotes prosocial behavior in human evaluations.