ChatPaper.aiChatPaper

TimeHC-RL:時序感知的層次化認知強化學習 ——提升大型語言模型的社交智能

TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

May 30, 2025
作者: Guiyang Hou, Xing Gao, Yuchuan Wu, Xiang Huang, Wenqi Zhang, Zhe Zheng, Yongliang Shen, Jialu Du, Fei Huang, Yongbin Li, Weiming Lu
cs.AI

摘要

近年來,大型語言模型(LLMs)在需要深思熟慮的智商相關領域,如數學和編程,取得了顯著進展。然而,從訓練後的角度提升LLMs在社交領域的認知發展,仍是一個未被充分探索的課題。考慮到社交世界遵循獨特的時間線,並需要比主要依賴系統2認知(謹慎、逐步推理)的數學更豐富的認知模式混合(從直覺反應(系統1)和表層思維到深思熟慮(系統2)),我們引入了時間感知分層認知強化學習(TimeHC-RL)來增強LLMs的社交智能。在實驗中,我們系統地探索了提升LLMs社交智能的方法,並通過五種其他訓練後範式和兩種測試時干預範式,在八個具有多樣數據模式的數據集上驗證了TimeHC-RL方法的有效性。實驗結果顯示,我們提出的TimeHC-RL方法相較於廣泛採用的系統2強化學習方法具有顯著優勢。它為7B基礎模型插上了翅膀,使其能夠與DeepSeek-R1和OpenAI-O3等先進模型的性能相媲美。此外,從訓練後和測試時干預的角度系統探索提升LLMs社交智能的過程中,我們還發現了若干有價值的洞見。
English
Recently, Large Language Models (LLMs) have made significant progress in IQ-related domains that require careful thinking, such as mathematics and coding. However, enhancing LLMs' cognitive development in social domains, particularly from a post-training perspective, remains underexplored. Recognizing that the social world follows a distinct timeline and requires a richer blend of cognitive modes (from intuitive reactions (System 1) and surface-level thinking to deliberate thinking (System 2)) than mathematics, which primarily relies on System 2 cognition (careful, step-by-step reasoning), we introduce Temporal-aware Hierarchical Cognitive Reinforcement Learning (TimeHC-RL) for enhancing LLMs' social intelligence. In our experiments, we systematically explore improving LLMs' social intelligence and validate the effectiveness of the TimeHC-RL method, through five other post-training paradigms and two test-time intervention paradigms on eight datasets with diverse data patterns. Experimental results reveal the superiority of our proposed TimeHC-RL method compared to the widely adopted System 2 RL method. It gives the 7B backbone model wings, enabling it to rival the performance of advanced models like DeepSeek-R1 and OpenAI-O3. Additionally, the systematic exploration from post-training and test-time interventions perspectives to improve LLMs' social intelligence has uncovered several valuable insights.
PDF112June 5, 2025