基于Flash-SemiCRF的流式结构化推理
Streaming Structured Inference with Flash-SemiCRF
April 20, 2026
作者: Benjamin K. Johnson, Thomas Goralski, Ayush Semwal, Hui Shen, H. Josh Jang
cs.AI
摘要
半马尔可夫条件随机场(semi-CRFs)通过为序列片段而非单个位置分配标签,实现了对片段级特征的精确推断及边界不确定性估计。然而现有实现方案需实例化一个规模随序列长度、最大片段长度和标签数量增长的大型边势能张量,这在语音级状态空间中已显吃力,对于序列长度可能超过10万个位置的基因组尺度更是难以处理。该内存瓶颈限制了精确片段级推断在长序列和大标签集场景的应用。我们发现核心问题在于实例化了本可通过紧凑前缀和数组动态计算的边势能,并据此提出三项改进:首先用前缀和查询替代存储式边势能张量,使内存占用降低为原大小的(片段长度×标签数)分之一;其次采用带检查点边界归一化的流式前后向传播算法,在保持梯度精确性的同时将工作内存控制在序列长度的亚线性规模;最后通过零中心累积分数控制数值漂移,在标签不平衡时引入自适应时长先验。我们将这些创新集成至Flash-SemiCRF——一个融合式Triton内核,可在原先难以处理的规模上实现精确半条件随机场推断。项目地址:https://github.com/biobenkj/flash-semicrf。
English
Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncertainty estimates at their boundaries. However, existing implementations must materialize a large edge potential tensor whose size grows with sequence length, maximum segment length, and label count, becoming prohibitive for speech-scale state spaces and intractable at genomic scales where sequences can exceed 100,000 positions. This memory bottleneck has limited the adoption of exact segment-level inference for long sequences and large label sets. We identify that the core inefficiency is materializing edge potentials that can instead be evaluated on-the-fly from a compact prefix-sum array, and make several improvements. First, replacing the stored edge tensor with prefix-sum lookup reduces the memory footprint by a factor proportional to the product of segment length and label count. Second, a streaming forward-backward pass with checkpoint-boundary normalization keeps working memory sublinear in sequence length while preserving exact gradients. Third, zero-centered cumulative scores control numerical drift and induce an adaptive duration prior under label imbalance. We integrate these ideas into Flash-SemiCRF, a fused Triton kernel that enables exact semi-CRF inference on previously intractable problem sizes. Available at https://github.com/biobenkj/flash-semicrf.