반응형 트랜스포머(RxT) — 이벤트 주도 반응형 언어 모델을 위한 상태 유지 실시간 처리

초록

트랜스포머(Transformer) 아키텍처는 대규모 언어 모델(Large Language Models, LLMs)의 사실상 표준으로 자리 잡았으며, 언어 이해 및 생성 분야에서 뛰어난 능력을 입증해 왔다. 그러나 대화형 인공지능(Conversational AI)에서의 적용은 본질적으로 상태를 유지하지 않는 특성과 시퀀스 길이 L에 대한 2차 계산 복잡도(O(L^2))로 인해 제약을 받고 있다. 현재의 모델들은 각 대화 턴마다 점점 확장되는 대화 기록을 재처리함으로써 메모리를 모방하고 있으나, 이는 긴 대화에서 과도한 비용과 지연을 초래한다. 본 논문은 이러한 한계를 극복하기 위해 데이터 주도(data-driven) 패러다임에서 이벤트 주도(event-driven) 패러다임으로 전환한 새로운 아키텍처인 반응형 트랜스포머(Reactive Transformer, RxT)를 소개한다. RxT는 각 대화 턴을 실시간으로 개별 이벤트로 처리하며, 통합된 고정 크기의 단기 메모리(Short-Term Memory, STM) 시스템에서 컨텍스트를 유지한다. 이 아키텍처는 생성기-디코더(generator-decoder)가 현재 질의와 이전 메모리 상태를 기반으로 응답을 생성한 후, 메모리-인코더(memory-encoder)와 전용 메모리 어텐션 네트워크(Memory Attention network)가 비동기적으로 STM을 전체 상호작용의 표현으로 업데이트하는 독특한 운영 주기를 특징으로 한다. 이 설계는 스케일링 역학을 근본적으로 변화시켜, 대화의 총 사용자 대면 비용을 상호작용 횟수 N에 대해 2차(O(N^2 cdot T))에서 선형(O(N cdot T))으로 감소시킨다. 응답 생성과 메모리 업데이트를 분리함으로써 RxT는 낮은 지연 시간을 달성하며, 진정한 실시간, 상태 유지, 경제적으로 실행 가능한 장기 대화를 가능하게 한다. 우리는 합성 데이터를 사용한 일련의 개념 검증 실험을 통해 이 아키텍처를 검증하였으며, 비슷한 크기의 상태 비저장 모델과 비교하여 우수한 성능과 일정한 시간의 추론 지연 시간을 입증하였다.

English

The Transformer architecture has become the de facto standard for Large Language Models (LLMs), demonstrating remarkable capabilities in language understanding and generation. However, its application in conversational AI is fundamentally constrained by its stateless nature and the quadratic computational complexity (O(L^2)) with respect to sequence length L. Current models emulate memory by reprocessing an ever-expanding conversation history with each turn, leading to prohibitive costs and latency in long dialogues. This paper introduces the Reactive Transformer (RxT), a novel architecture designed to overcome these limitations by shifting from a data-driven to an event-driven paradigm. RxT processes each conversational turn as a discrete event in real-time, maintaining context in an integrated, fixed-size Short-Term Memory (STM) system. The architecture features a distinct operational cycle where a generator-decoder produces a response based on the current query and the previous memory state, after which a memory-encoder and a dedicated Memory Attention network asynchronously update the STM with a representation of the complete interaction. This design fundamentally alters the scaling dynamics, reducing the total user-facing cost of a conversation from quadratic (O(N^2 cdot T)) to linear (O(N cdot T)) with respect to the number of interactions N. By decoupling response generation from memory updates, RxT achieves low latency, enabling truly real-time, stateful, and economically viable long-form conversations. We validated our architecture with a series of proof-of-concept experiments on synthetic data, demonstrating superior performance and constant-time inference latency compared to a baseline stateless model of comparable size.

반응형 트랜스포머(RxT) — 이벤트 주도 반응형 언어 모델을 위한 상태 유지 실시간 처리

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

초록

Support