リアクティブトランスフォーマー（RxT）——イベント駆動型リアクティブ言語モデルのためのステートフルリアルタイム処理

要旨

Transformerアーキテクチャは、大規模言語モデル（LLMs）の事実上の標準となり、言語理解と生成において顕著な能力を発揮しています。しかし、その会話型AIへの応用は、そのステートレスな性質とシーケンス長Lに対する二次計算複雑度（O(L^2)）によって根本的に制約されています。現在のモデルは、各ターンごとに拡大し続ける会話履歴を再処理することでメモリを模倣しており、長い対話においてはコストと遅延が過大になります。本論文では、これらの制限を克服するために、データ駆動型からイベント駆動型のパラダイムへと移行する新しいアーキテクチャであるReactive Transformer（RxT）を紹介します。RxTは、各会話ターンをリアルタイムで個別のイベントとして処理し、統合された固定サイズの短期記憶（STM）システム内でコンテキストを維持します。このアーキテクチャは、ジェネレータ-デコーダが現在のクエリと前回のメモリ状態に基づいて応答を生成し、その後、メモリ-エンコーダと専用のメモリアテンションネットワークが非同期にSTMを完全なインタラクションの表現で更新するという明確な操作サイクルを特徴としています。この設計により、スケーリングのダイナミクスが根本的に変化し、会話のユーザー側の総コストが、インタラクション数Nに対して二次（O(N^2 cdot T)）から線形（O(N cdot T)）に減少します。応答生成とメモリ更新を分離することで、RxTは低遅延を実現し、真のリアルタイムでステートフルかつ経済的に実行可能な長文会話を可能にします。我々は、合成データを用いた一連の概念実証実験を通じて、このアーキテクチャを検証し、同等サイズのベースラインのステートレスモデルと比較して優れた性能と一定時間の推論遅延を実証しました。

English

The Transformer architecture has become the de facto standard for Large Language Models (LLMs), demonstrating remarkable capabilities in language understanding and generation. However, its application in conversational AI is fundamentally constrained by its stateless nature and the quadratic computational complexity (O(L^2)) with respect to sequence length L. Current models emulate memory by reprocessing an ever-expanding conversation history with each turn, leading to prohibitive costs and latency in long dialogues. This paper introduces the Reactive Transformer (RxT), a novel architecture designed to overcome these limitations by shifting from a data-driven to an event-driven paradigm. RxT processes each conversational turn as a discrete event in real-time, maintaining context in an integrated, fixed-size Short-Term Memory (STM) system. The architecture features a distinct operational cycle where a generator-decoder produces a response based on the current query and the previous memory state, after which a memory-encoder and a dedicated Memory Attention network asynchronously update the STM with a representation of the complete interaction. This design fundamentally alters the scaling dynamics, reducing the total user-facing cost of a conversation from quadratic (O(N^2 cdot T)) to linear (O(N cdot T)) with respect to the number of interactions N. By decoupling response generation from memory updates, RxT achieves low latency, enabling truly real-time, stateful, and economically viable long-form conversations. We validated our architecture with a series of proof-of-concept experiments on synthetic data, demonstrating superior performance and constant-time inference latency compared to a baseline stateless model of comparable size.

リアクティブトランスフォーマー（RxT）——イベント駆動型リアクティブ言語モデルのためのステートフルリアルタイム処理

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

要旨

Support