ChatPaper.aiChatPaper

反應式變壓器(RxT)——面向事件驅動反應式語言模型的狀態化實時處理

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

October 3, 2025
作者: Adam Filipek
cs.AI

摘要

Transformer架構已成為大型語言模型(LLMs)的實際標準,在語言理解與生成方面展現出卓越的能力。然而,其在對話式人工智慧中的應用,本質上受到其無狀態特性及與序列長度L相關的二次方計算複雜度(O(L^2))的限制。現有模型通過在每一輪對話中重新處理不斷擴展的對話歷史來模擬記憶,這導致在長對話中產生高昂的成本和延遲。本文介紹了反應式Transformer(RxT),這是一種新穎的架構,旨在通過從數據驅動轉向事件驅動範式來克服這些限制。RxT將每一輪對話作為即時離散事件處理,並在一個集成的、固定大小的短期記憶(STM)系統中維護上下文。該架構具有一個獨特的操作週期,其中生成器-解碼器基於當前查詢和先前的記憶狀態生成回應,隨後記憶編碼器和專用的記憶注意力網絡異步更新STM,以反映完整互動的表徵。這一設計從根本上改變了擴展動態,將對話的總用戶面成本從與互動次數N相關的二次方(O(N^2 cdot T))降低到線性(O(N cdot T))。通過將回應生成與記憶更新解耦,RxT實現了低延遲,支持真正即時、有狀態且經濟可行的長篇對話。我們在合成數據上進行了一系列概念驗證實驗,驗證了我們的架構,展示了與同等規模的無狀態基線模型相比,其優越的性能和恆定時間的推理延遲。
English
The Transformer architecture has become the de facto standard for Large Language Models (LLMs), demonstrating remarkable capabilities in language understanding and generation. However, its application in conversational AI is fundamentally constrained by its stateless nature and the quadratic computational complexity (O(L^2)) with respect to sequence length L. Current models emulate memory by reprocessing an ever-expanding conversation history with each turn, leading to prohibitive costs and latency in long dialogues. This paper introduces the Reactive Transformer (RxT), a novel architecture designed to overcome these limitations by shifting from a data-driven to an event-driven paradigm. RxT processes each conversational turn as a discrete event in real-time, maintaining context in an integrated, fixed-size Short-Term Memory (STM) system. The architecture features a distinct operational cycle where a generator-decoder produces a response based on the current query and the previous memory state, after which a memory-encoder and a dedicated Memory Attention network asynchronously update the STM with a representation of the complete interaction. This design fundamentally alters the scaling dynamics, reducing the total user-facing cost of a conversation from quadratic (O(N^2 cdot T)) to linear (O(N cdot T)) with respect to the number of interactions N. By decoupling response generation from memory updates, RxT achieves low latency, enabling truly real-time, stateful, and economically viable long-form conversations. We validated our architecture with a series of proof-of-concept experiments on synthetic data, demonstrating superior performance and constant-time inference latency compared to a baseline stateless model of comparable size.
PDF212October 7, 2025