基于闪存半条件随机场的流式结构化推理
Streaming Structured Inference with Flash-SemiCRF
April 20, 2026
作者: Benjamin K. Johnson, Thomas Goralski, Ayush Semwal, Hui Shen, H. Josh Jang
cs.AI
摘要
半马尔可夫条件随机场(semi-CRFs)通过为序列片段而非单个位置分配标签,实现了对片段级特征的精确推断及边界不确定性估计。然而现有实现需实例化一个规模随序列长度、最大片段长度和标签数量增长的大型边势能张量,这在语音级状态空间中已显吃力,对于序列长度可能超过10万个位置的基因组尺度更是难以处理。该内存瓶颈限制了长序列与大标签集场景下精确片段级推断的应用。我们发现核心问题在于实例化边势能的操作可被动态计算替代:首先利用紧凑前缀和数组进行实时评估,将内存占用量降低至与片段长度和标签数量乘积成反比;其次采用带检查点边界归一化的流式前向-后向传播算法,在保持梯度精确性的同时使工作内存与序列长度呈次线性关系;最后通过零中心累积分数控制数值漂移,在标签不平衡情况下引入自适应时长先验。我们将这些改进集成至Flash-SemiCRF——一个融合了Triton内核的计算框架,可在原先难以处理的规模上实现精确半条件随机场推断。项目地址:https://github.com/biobenkj/flash-semicrf。
English
Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncertainty estimates at their boundaries. However, existing implementations must materialize a large edge potential tensor whose size grows with sequence length, maximum segment length, and label count, becoming prohibitive for speech-scale state spaces and intractable at genomic scales where sequences can exceed 100,000 positions. This memory bottleneck has limited the adoption of exact segment-level inference for long sequences and large label sets. We identify that the core inefficiency is materializing edge potentials that can instead be evaluated on-the-fly from a compact prefix-sum array, and make several improvements. First, replacing the stored edge tensor with prefix-sum lookup reduces the memory footprint by a factor proportional to the product of segment length and label count. Second, a streaming forward-backward pass with checkpoint-boundary normalization keeps working memory sublinear in sequence length while preserving exact gradients. Third, zero-centered cumulative scores control numerical drift and induce an adaptive duration prior under label imbalance. We integrate these ideas into Flash-SemiCRF, a fused Triton kernel that enables exact semi-CRF inference on previously intractable problem sizes. Available at https://github.com/biobenkj/flash-semicrf.