트리플렛-블록 확산 RWKV

초록

인과적 트랜스포머 언어 모델은 엄격한 순차 디코딩과 단계별 2차 주의 비용이라는 문제를 안고 있다. 선형 시간 인과 모델과 이산 확산 모델은 각각 이러한 약점을 해결하지만, 이들의 통합은 본질적으로 일관성이 없다. 확산은 양방향 주의를 필요로 하는 반면, 인과 모델은 단방향이기 때문이다. 이러한 아키텍처를 통합하기 위해, 우리는 B^3D-RWKV를 제안한다. 이는 삼중 블록 배치 방법을 통해 모델의 O(L) 추론 효율성과 병렬 양방향 이산 확산을 통합하는 확산 RWKV 변종이다. B^3D-RWKV-7.2B는 8개 작업 모음에서 기존 모델과 비슷한 정확도를 달성하면서도, 디코딩 처리량에서 기준 모델을 크게 상회하며 평균 1.6배의 속도 향상을 보인다.

English

Causal Transformer language models suffer from strictly sequential decoding and a quadratic per-step attention cost. While linear-time causal models and discrete diffusion models each address these weaknesses, their integration remains inherently inconsistent: diffusion requires bidirectional attention, while causal models are unidirectional. To unify these architectures, we propose B^3D-RWKV, a diffusion RWKV variant that integrates the model's O(L) inference efficiency with parallel, bidirectional discrete-diffusion through a triplet-block layout method. B^3D-RWKV-7.2B reaches comparable accuracy on an 8-task suite versus existing models while significantly outperforming baselines in decoding throughput with an average of 1.6times speedup.