ChatPaper.aiChatPaper

TimeHC-RL:时序感知的层次认知强化学习 ——提升大语言模型社交智能的新方法

TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

May 30, 2025
作者: Guiyang Hou, Xing Gao, Yuchuan Wu, Xiang Huang, Wenqi Zhang, Zhe Zheng, Yongliang Shen, Jialu Du, Fei Huang, Yongbin Li, Weiming Lu
cs.AI

摘要

近期,大型语言模型(LLMs)在需要缜密思考的智商相关领域,如数学与编程,取得了显著进展。然而,从训练后优化的角度提升LLMs在社会领域中的认知发展,仍是一个待深入探索的课题。鉴于社会世界遵循独特的时间线,且相较于主要依赖系统二认知(即谨慎、逐步推理)的数学,它需要更丰富的认知模式融合(从直觉反应(系统一)及表层思考到深思熟虑(系统二)),我们提出了时间感知的分层认知强化学习(TimeHC-RL),以增强LLMs的社会智能。在实验中,我们系统性地探索了提升LLMs社会智能的途径,并通过五种其他训练后范式及两种测试时干预范式,在八个具有多样数据模式的数据集上验证了TimeHC-RL方法的有效性。实验结果表明,相较于广泛采用的系统二强化学习方法,我们提出的TimeHC-RL方法展现出明显优势,它如同为7B基础模型插上了翅膀,使其能够与DeepSeek-R1和OpenAI-O3等先进模型一较高下。此外,从训练后优化与测试时干预两个维度系统性地探索提升LLMs社会智能的过程中,我们还揭示了几项有价值的洞见。
English
Recently, Large Language Models (LLMs) have made significant progress in IQ-related domains that require careful thinking, such as mathematics and coding. However, enhancing LLMs' cognitive development in social domains, particularly from a post-training perspective, remains underexplored. Recognizing that the social world follows a distinct timeline and requires a richer blend of cognitive modes (from intuitive reactions (System 1) and surface-level thinking to deliberate thinking (System 2)) than mathematics, which primarily relies on System 2 cognition (careful, step-by-step reasoning), we introduce Temporal-aware Hierarchical Cognitive Reinforcement Learning (TimeHC-RL) for enhancing LLMs' social intelligence. In our experiments, we systematically explore improving LLMs' social intelligence and validate the effectiveness of the TimeHC-RL method, through five other post-training paradigms and two test-time intervention paradigms on eight datasets with diverse data patterns. Experimental results reveal the superiority of our proposed TimeHC-RL method compared to the widely adopted System 2 RL method. It gives the 7B backbone model wings, enabling it to rival the performance of advanced models like DeepSeek-R1 and OpenAI-O3. Additionally, the systematic exploration from post-training and test-time interventions perspectives to improve LLMs' social intelligence has uncovered several valuable insights.
PDF112June 5, 2025