人工生成智能:强化学习中的文化积累
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning
June 1, 2024
作者: Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster
cs.AI
摘要
文化积累推动着人类历史上涵盖各种能力的开放性和多样化进步。它通过将个体探索与代际信息传递相结合来构建日益扩大的知识和技能体系。尽管在人类中取得了广泛成功,但人工学习代理积累文化的能力仍未得到充分探讨。特别是,强化学习方法通常仅致力于在单个生命周期内的改进。现有的代际算法未能捕捉文化积累的开放性、新兴特性,这使个体能够在创新和模仿之间权衡选择。基于先前展示的强化学习代理执行社会学习的能力,我们发现通过平衡社会学习和独立学习的训练设置会促成文化积累。这些积累代理的表现优于仅接受单个生命周期训练的代理,但二者具有相同的累积经验。我们通过构建两个模型来探索这种积累,这两个模型基于两种不同的代际概念:情境代际,其中积累通过情境学习发生;训练时间代际,其中积累通过权重学习发生。情境和权重文化积累可以被解释为类似于知识和技能积累。据我们所知,这项工作是第一个提出在强化学习中实现新兴文化积累的通用模型,为更加开放性的学习系统开辟了新途径,同时为建模人类文化提供了新机会。
English
Cultural accumulation drives the open-ended and diverse progress in
capabilities spanning human history. It builds an expanding body of knowledge
and skills by combining individual exploration with inter-generational
information transmission. Despite its widespread success among humans, the
capacity for artificial learning agents to accumulate culture remains
under-explored. In particular, approaches to reinforcement learning typically
strive for improvements over only a single lifetime. Generational algorithms
that do exist fail to capture the open-ended, emergent nature of cultural
accumulation, which allows individuals to trade-off innovation and imitation.
Building on the previously demonstrated ability for reinforcement learning
agents to perform social learning, we find that training setups which balance
this with independent learning give rise to cultural accumulation. These
accumulating agents outperform those trained for a single lifetime with the
same cumulative experience. We explore this accumulation by constructing two
models under two distinct notions of a generation: episodic generations, in
which accumulation occurs via in-context learning and train-time generations,
in which accumulation occurs via in-weights learning. In-context and in-weights
cultural accumulation can be interpreted as analogous to knowledge and skill
accumulation, respectively. To the best of our knowledge, this work is the
first to present general models that achieve emergent cultural accumulation in
reinforcement learning, opening up new avenues towards more open-ended learning
systems, as well as presenting new opportunities for modelling human culture.Summary
AI-Generated Summary