Geverifieerde detectie en preventie van concurrentie-anomalieën in multi-agent-grote-taalmodel-systemen

Samenvatting

Multi-agent LLM-systemen delen toestand via geheugenopslag, vectorindexen en toolregistraties. We modelleren dergelijk delen als langlopende lees-genereer-schrijfoperaties onder deterministische-generatiesemantiek – het regime dat duurzame-uitvoeringsengines afdwingen door deterministische herhaling – en formaliseren vier concurrentieanomalieën in TLA+: verouderde-generatie, fantoomtool, causale-cascade en tool-effect-herordening, structurele analoga van klassieke isolatieanomalieën, elk met een TLC-tegenvoorbeeld. Het uitsluitingsrooster over deze anomalieën is triviaal; de bijdrage is de mechanisch geverifieerde realiseerbaarheid en strikte scheiding van één maximale keten daarbinnen, L₀ ⊊ ... ⊊ L₄ – voor zover wij weten de eerste machinaal gecontroleerde consistentiehierarchie voor dergelijke runtimes. Een ontwikkeling van 274 Verus-verplichtingen (nul assume, nul admit; vertrouwensbasis: twee structurele axioma's en een mutex-correspondentie) bewijst dat de detectoren correct en volledig zijn ten opzichte van de specificaties en elke runtime zijn vermijdingsset. Drie ingezette Rust-runtimes realiseren L0–L1 (pessimistische vergrendeling, serialiseerbare snapshot-isolatie, standaard-SI), elk geverifieerd tegen verouderde-generatie en verfijnd tot zijn toestandsmachine; L2–L4 zijn uitvoeringsmodus-geverifieerd met afhankelijkheidsvrije preventietweelingen (A3, A6, A2: 0/1000 versus 1000/1000), en L2 wordt live uitgevoerd over drie modelfamilies (A3 voorkomen in alle 120 ingetrokken sessies). We reproduceren een stille verloren update in ByteDance's deer-flow, waarbij we de oplossing formaliseren als een geverifieerde L₀-naar-L₁-verfijning, en tonen tool-effect-herordening in LangGraph's ToolNode op ongewijzigde uitvoer, verwijderd door een L3 commit-volgorde-sequencer. De geverifieerde detector, verfijningen en realiseerbaarheidsartefacten zijn de bijdrage; de fenomenen en het rooster zijn klassiek.

English

Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, L_0 subsetneq cdots subsetneq L_4, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified L_0 to L_1 refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.