ChatPaper.aiChatPaper

Spec Kit 智能体:基于上下文的智能工作流系统

Spec Kit Agents: Context-Grounded Agentic Workflows

April 7, 2026
作者: Pardis Taghavi, Santosh Bhavani
cs.AI

摘要

基于规范驱动的AI编程代理开发(SDD)虽提供了结构化工作流,但在大型演进式代码库中,代理常处于"上下文盲区",导致API幻觉与架构违规。我们提出规范工具包代理——一种配备项目经理与开发者角色的多代理SDD流水线,通过阶段级上下文锚定钩子增强系统。只读探查钩子将每个阶段(规范制定、计划、任务分解、实现)锚定于代码库证据,而验证钩子则对环境中的中间产物进行校验。我们在五个代码库中对32项特性进行128轮实验评估:上下文锚定钩子使LLM评委综合打分(1-5分制)提升0.15分(相当于总分提升3.0%;威尔科克森符号秩检验p<0.05),同时保持99.7%-100%的代码库级别测试兼容性。在SWE-bench Lite基准测试中,增强型钩子将基线性能提升1.7%,达到58.2%的Pass@1通过率。
English
Spec-driven development (SDD) with AI coding agents provides a structured workflow, but agents often remain "context blind" in large, evolving repositories, leading to hallucinated APIs and architectural violations. We present Spec Kit Agents, a multi-agent SDD pipeline (with PM and developer roles) that adds phase-level, context-grounding hooks. Read-only probing hooks ground each stage (Specify, Plan, Tasks, Implement) in repository evidence, while validation hooks check intermediate artifacts against the environment. We evaluate 128 runs covering 32 features across five repositories. Context-grounding hooks improve judged quality by +0.15 on a 1-5 composite LLM-as-judge score (+3.0 percent of the full score; Wilcoxon signed-rank, p < 0.05) while maintaining 99.7-100 percent repository-level test compatibility. We further evaluate the framework on SWE-bench Lite, where augmentation hooks improve baseline by 1.7 percent, achieving 58.2 percent Pass@1.