ChatPaper.aiChatPaper

人工生成智能:文化累積在強化學習中

Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning

June 1, 2024
作者: Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster
cs.AI

摘要

文化累積推動了跨越人類歷史的開放且多樣化能力進步。它通過結合個人探索和代際信息傳遞來建立一個不斷擴大的知識和技能體系。儘管在人類中取得廣泛成功,但人工學習代理積累文化的能力仍未被充分探索。特別是,強化學習方法通常只致力於在單個生命週期內的改進。現有的代際算法未能捕捉文化累積的開放性和新興特性,這使個體能夠在創新和模仿之間取得平衡。基於先前展示的強化學習代理執行社會學習的能力,我們發現平衡社會學習和獨立學習的訓練設置導致文化累積。這些累積的代理優於僅接受單個生命週期訓練且具有相同累積經驗的代理。我們通過構建兩個模型來探索這種累積,這兩個模型基於兩種不同的代際概念:情境代際,其中累積通過情境學習發生,以及訓練時間代際,其中累積通過權重學習發生。情境和權重的文化累積可以被解釋為類比於知識和技能的累積。據我們所知,這項工作是首次提出在強化學習中實現新興文化累積的通用模型,為更開放式的學習系統開辟了新途徑,同時為建模人類文化提供了新機會。
English
Cultural accumulation drives the open-ended and diverse progress in capabilities spanning human history. It builds an expanding body of knowledge and skills by combining individual exploration with inter-generational information transmission. Despite its widespread success among humans, the capacity for artificial learning agents to accumulate culture remains under-explored. In particular, approaches to reinforcement learning typically strive for improvements over only a single lifetime. Generational algorithms that do exist fail to capture the open-ended, emergent nature of cultural accumulation, which allows individuals to trade-off innovation and imitation. Building on the previously demonstrated ability for reinforcement learning agents to perform social learning, we find that training setups which balance this with independent learning give rise to cultural accumulation. These accumulating agents outperform those trained for a single lifetime with the same cumulative experience. We explore this accumulation by constructing two models under two distinct notions of a generation: episodic generations, in which accumulation occurs via in-context learning and train-time generations, in which accumulation occurs via in-weights learning. In-context and in-weights cultural accumulation can be interpreted as analogous to knowledge and skill accumulation, respectively. To the best of our knowledge, this work is the first to present general models that achieve emergent cultural accumulation in reinforcement learning, opening up new avenues towards more open-ended learning systems, as well as presenting new opportunities for modelling human culture.

Summary

AI-Generated Summary

PDF141December 12, 2024