多智能體大型語言模型系統中並發異常的經驗證偵測與預防
Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems
June 15, 2026
作者: Sajjad Khan
cs.AI
摘要
多智能體大型語言模型系統透過記憶儲存、向量索引及工具註冊表共享狀態。我們將此類共享建模為在確定性生成語義(即持久執行引擎透過確定性重播所執行的機制)下的長期讀取-生成-寫入操作,並在TLA+中形式化四種並發異常:過時生成(stale-generation)、幻影工具(phantom-tool)、因果串聯(causal-cascade)及工具效應重排序(tool-effect reordering),這些是經典隔離異常的結構類比,每種異常均附有TLC反例。這些異常上的排除格(exclusion lattice)是平凡的;其貢獻在於機械驗證的可實現性及其中一條最大鏈的嚴格分離,即L₀ ⊊ ⋯ ⊊ L₄,據我們所知,這是此類運行時首個經機器檢查的一致性層級體系。一項包含274項Verus義務(零假設、零容許;信任基礎:兩條結構公理及一個互斥對應)的開發工作,證明了檢測器相對於規範的完備性與可靠性,以及每個運行時對其避免集合的滿足性。三個已部署的Rust運行時實現了L₀-L₁(悲觀鎖定、可序列化快照隔離、預設快照隔離),各運行時均已針對過時生成進行驗證,並細化至其狀態機;L₂-L₄經執行模式驗證,並配備無依賴預防對偶(A3、A6、A2:0/1000對比1000/1000),其中L₂在三個模型家族中進行了即時運行(A3在所有120次撤銷會話中均被預防)。我們重現了位元組跳動deer-flow中的一個隱匿遺失更新,將其修復形式化為一個經驗證的L₀至L₁細化,並在LangGraph的ToolNode上展示了對未修改輸出的工具效應重排序,該問題已透過L₃提交順序排序器消除。經驗證的檢測器、細化及可實現性產物是本文的貢獻;所述現象與格架構則屬於經典範疇。
English
Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, L_0 subsetneq cdots subsetneq L_4, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified L_0 to L_1 refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.