AVO：面向自主进化搜索的智能体变异算子

摘要

代理变异算子（AVO）是一类新型进化变异算子，其通过自主编码代理取代了经典进化搜索中固定的突变、交叉和人工设计的启发式策略。与将语言模型局限于预设流程中的候选生成不同，AVO将变异实例化为自主代理循环，可参考当前谱系、领域知识库及执行反馈来提出、修复、批判和验证实现方案的修改。我们在英伟达Blackwell（B200）GPU上对AI领域优化强度最高的注意力机制进行测试。经过7天多头注意力的持续自主进化，AVO发现的核函数在评估配置中比cuDNN最高提升3.5%，较FlashAttention-4最高提升10.5%。所发现的优化方案可快速迁移至分组查询注意力，仅需30分钟自主适配即实现较cuDNN最高7.0%、较FlashAttention-4最高9.3%的性能增益。这些结果表明，代理变异算子通过将代理从候选生成器升级为变异算子，超越了以往LLM参与循环的进化流程，能够发现性能关键的微架构优化，产出的核函数在当前最先进GPU硬件上超越了专家工程实现的最优注意力算法。

English

Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO instantiates variation as a self-directed agent loop that can consult the current lineage, a domain-specific knowledge base, and execution feedback to propose, repair, critique, and verify implementation edits. We evaluate AVO on attention, among the most aggressively optimized kernel targets in AI, on NVIDIA Blackwell (B200) GPUs. Over 7 days of continuous autonomous evolution on multi-head attention, AVO discovers kernels that outperform cuDNN by up to 3.5% and FlashAttention-4 by up to 10.5% across the evaluated configurations. The discovered optimizations transfer readily to grouped-query attention, requiring only 30 minutes of additional autonomous adaptation and yielding gains of up to 7.0% over cuDNN and 9.3% over FlashAttention-4. Together, these results show that agentic variation operators move beyond prior LLM-in-the-loop evolutionary pipelines by elevating the agent from candidate generator to variation operator, and can discover performance-critical micro-architectural optimizations that produce kernels surpassing state-of-the-art expert-engineered attention implementations on today's most advanced GPU hardware.