TimeHC-RL: 時間認識型階層的認知強化学習 - 大規模言語モデルの社会的知性を高めるためのアプローチ -

要旨

近年、大規模言語モデル（LLMs）は、数学やコーディングなど慎重な思考を要するIQ関連領域で著しい進歩を遂げてきた。しかし、特にポストトレーニングの観点から、LLMsの社会的領域における認知発達を向上させることは未だ十分に検討されていない。数学が主にシステム2の認知（慎重で段階的な推論）に依存するのに対し、社会の世界は独自のタイムラインに従い、直感的な反応（システム1）や表面的な思考から意図的な思考（システム2）まで、より豊かな認知モードの融合を必要とすることに着目し、我々はLLMsの社会的知能を向上させるための時間認識型階層的認知強化学習（TimeHC-RL）を提案する。実験では、8つの多様なデータパターンを持つデータセットにおいて、5つのポストトレーニングパラダイムと2つのテストタイム介入パラダイムを通じて、LLMsの社会的知能を体系的に向上させ、TimeHC-RL手法の有効性を検証した。実験結果は、広く採用されているシステム2 RL手法と比較して、我々が提案するTimeHC-RL手法の優位性を示している。この手法は7Bバックボーンモデルに翼を与え、DeepSeek-R1やOpenAI-O3のような先進モデルと肩を並べる性能を実現した。さらに、ポストトレーニングとテストタイム介入の観点からLLMsの社会的知能を向上させるための体系的探求により、いくつかの貴重な知見が得られた。

English

Recently, Large Language Models (LLMs) have made significant progress in IQ-related domains that require careful thinking, such as mathematics and coding. However, enhancing LLMs' cognitive development in social domains, particularly from a post-training perspective, remains underexplored. Recognizing that the social world follows a distinct timeline and requires a richer blend of cognitive modes (from intuitive reactions (System 1) and surface-level thinking to deliberate thinking (System 2)) than mathematics, which primarily relies on System 2 cognition (careful, step-by-step reasoning), we introduce Temporal-aware Hierarchical Cognitive Reinforcement Learning (TimeHC-RL) for enhancing LLMs' social intelligence. In our experiments, we systematically explore improving LLMs' social intelligence and validate the effectiveness of the TimeHC-RL method, through five other post-training paradigms and two test-time intervention paradigms on eight datasets with diverse data patterns. Experimental results reveal the superiority of our proposed TimeHC-RL method compared to the widely adopted System 2 RL method. It gives the 7B backbone model wings, enabling it to rival the performance of advanced models like DeepSeek-R1 and OpenAI-O3. Additionally, the systematic exploration from post-training and test-time interventions perspectives to improve LLMs' social intelligence has uncovered several valuable insights.