多智能体大语言模型系统中并发异常的验证检测与预防
Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems
June 15, 2026
作者: Sajjad Khan
cs.AI
摘要
多智能体LLM系统通过存储仓库、向量索引和工具注册表共享状态。我们将此类共享建模为在确定性生成语义下的长期读取-生成-写入操作——这是持久执行引擎通过确定性重放所强化的机制——并在TLA+中形式化了四种并发异常:陈旧生成、幻象工具、因果级联和工具效应重排序,这些异常结构上类似于经典隔离异常,每种均配有TLC反例。这些异常上的排除格是平凡的;贡献在于机械验证了其内一个极大链L₀ ⊊ ⋯ ⊊ L₄的可实现性与严格分离——据我们所知,这是此类运行时首个经机器校验的一致性层次结构。274个Verus验证义务(零assume、零admit;信任基础:两条结构公理及一个互斥对应关系)证明了检测器相对于规约的可靠性与完备性,以及各运行时的回避集。三个已部署的Rust运行时实现了L₀-L₁(悲观锁、可序列化快照隔离、默认SI),每个均经陈旧生成验证并精化至其状态机;L₂-L₄通过执行模式验证,采用无依赖预防双生技术(A3、A6、A2:0/1000 vs 1000/1000),L₂在三族模型上在线运行(120个撤回会话中均预防A3)。我们复现了字节跳动deer-flow中一个静默丢失更新问题,将其修复形式化为从L₀到L₁的已验证精化,并在LangGraph的ToolNode未修改输出中展示了工具效应重排序,该问题通过L₃提交顺序序列器消除。验证后的检测器、精化与可实现性制品为本工作贡献;现象与格结构属于经典范畴。
English
Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, L_0 subsetneq cdots subsetneq L_4, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified L_0 to L_1 refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.