流等变循环神经网络
Flow Equivariant Recurrent Neural Networks
July 20, 2025
作者: T. Anderson Keller
cs.AI
摘要
数据以连续流的形式抵达我们的感官,从一个瞬间平滑地过渡到下一个瞬间。这些平滑的变换可被视为我们所处环境的连续对称性,定义了刺激随时间推移的等价关系。在机器学习中,那些尊重数据对称性的神经网络架构被称为等变网络,它们在泛化能力和样本效率方面具有可证明的优势。然而,迄今为止,等变性仅被考虑用于静态变换和前馈网络,这限制了其在序列模型(如循环神经网络RNNs)及相应的时间参数化序列变换中的应用。在本研究中,我们将等变网络理论扩展至“流”这一领域——即捕捉随时间自然变换的一参数李子群,如视觉运动。我们首先展示标准RNN通常不具备流等变性:对于移动的刺激,其隐藏状态未能以几何结构化的方式变换。随后,我们探讨了如何引入流等变性,并证明这些模型在训练速度、长度泛化及速度泛化方面显著优于非等变模型,无论是在下一步预测还是序列分类任务上。本研究旨在为构建尊重时间参数化对称性的序列模型迈出第一步,这些对称性支配着我们周围的世界。
English
Data arrives at our senses as a continuous stream, smoothly transforming from
one instant to the next. These smooth transformations can be viewed as
continuous symmetries of the environment that we inhabit, defining equivalence
relations between stimuli over time. In machine learning, neural network
architectures that respect symmetries of their data are called equivariant and
have provable benefits in terms of generalization ability and sample
efficiency. To date, however, equivariance has been considered only for static
transformations and feed-forward networks, limiting its applicability to
sequence models, such as recurrent neural networks (RNNs), and corresponding
time-parameterized sequence transformations. In this work, we extend
equivariant network theory to this regime of `flows' -- one-parameter Lie
subgroups capturing natural transformations over time, such as visual motion.
We begin by showing that standard RNNs are generally not flow equivariant:
their hidden states fail to transform in a geometrically structured manner for
moving stimuli. We then show how flow equivariance can be introduced, and
demonstrate that these models significantly outperform their non-equivariant
counterparts in terms of training speed, length generalization, and velocity
generalization, on both next step prediction and sequence classification. We
present this work as a first step towards building sequence models that respect
the time-parameterized symmetries which govern the world around us.