ChatPaper.aiChatPaper

反应式Transformer(RxT)——面向事件驱动反应式语言模型的状态化实时处理

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

October 3, 2025
作者: Adam Filipek
cs.AI

摘要

Transformer架构已成为大型语言模型(LLMs)的事实标准,在语言理解和生成方面展现出卓越能力。然而,其在对话式AI中的应用从根本上受限于其无状态特性及与序列长度L相关的二次方计算复杂度(O(L^2))。现有模型通过每次对话轮次重新处理不断扩大的对话历史来模拟记忆,导致长对话中成本与延迟难以承受。本文提出了一种新颖架构——反应式Transformer(RxT),旨在通过从数据驱动转向事件驱动范式来克服这些限制。RxT将每次对话轮次作为实时离散事件处理,并在一个集成、固定大小的短期记忆(STM)系统中维护上下文。该架构具有独特的操作周期:生成器-解码器基于当前查询及先前记忆状态生成响应,随后记忆编码器与专用记忆注意力网络异步更新STM,存储完整交互的表示。这一设计从根本上改变了扩展动态,将对话的总用户面成本从与交互次数N相关的二次方(O(N^2 cdot T))降至线性(O(N cdot T))。通过解耦响应生成与记忆更新,RxT实现了低延迟,支持真正实时、有状态且经济可行的长对话。我们通过一系列在合成数据上的概念验证实验验证了该架构,相比规模相当的无状态基线模型,RxT展现了更优性能及恒定时间推理延迟。
English
The Transformer architecture has become the de facto standard for Large Language Models (LLMs), demonstrating remarkable capabilities in language understanding and generation. However, its application in conversational AI is fundamentally constrained by its stateless nature and the quadratic computational complexity (O(L^2)) with respect to sequence length L. Current models emulate memory by reprocessing an ever-expanding conversation history with each turn, leading to prohibitive costs and latency in long dialogues. This paper introduces the Reactive Transformer (RxT), a novel architecture designed to overcome these limitations by shifting from a data-driven to an event-driven paradigm. RxT processes each conversational turn as a discrete event in real-time, maintaining context in an integrated, fixed-size Short-Term Memory (STM) system. The architecture features a distinct operational cycle where a generator-decoder produces a response based on the current query and the previous memory state, after which a memory-encoder and a dedicated Memory Attention network asynchronously update the STM with a representation of the complete interaction. This design fundamentally alters the scaling dynamics, reducing the total user-facing cost of a conversation from quadratic (O(N^2 cdot T)) to linear (O(N cdot T)) with respect to the number of interactions N. By decoupling response generation from memory updates, RxT achieves low latency, enabling truly real-time, stateful, and economically viable long-form conversations. We validated our architecture with a series of proof-of-concept experiments on synthetic data, demonstrating superior performance and constant-time inference latency compared to a baseline stateless model of comparable size.
PDF212October 7, 2025