AVO:面向自主进化搜索的智能体化变异算子
AVO: Agentic Variation Operators for Autonomous Evolutionary Search
March 25, 2026
作者: Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi
cs.AI
摘要
代理变异算子(AVO)是一类新型的进化变异算子,它通过自主编码代理取代了经典进化搜索中固定的突变、交叉和人工设计的启发式方法。与将语言模型限制在预设流程中进行候选生成不同,AVO将变异实例化为一个自主导向的代理循环,能够参考当前谱系、领域知识库和执行反馈来提出、修复、批判和验证实现方案的修改。我们在英伟达Blackwell(B200)GPU上针对AI领域优化最激进的核函数目标——注意力机制进行了评估。经过7天多头注意力的持续自主进化,AVO发现的核函数在评估配置中比cuDNN最高提升3.5%,较FlashAttention-4最高提升10.5%。这些发现的优化方案可快速迁移至分组查询注意力,仅需30分钟自主适配即实现较cuDNN最高7.0%、较FlashAttention-4最高9.3%的性能增益。这些结果表明,代理变异算子通过将代理角色从候选生成器提升为变异算子,超越了以往基于LLM的进化流程,并能发现关键的性能微架构优化,使生成的核函数在当今最先进GPU硬件上超越专家工程实现的最优注意力方案。
English
Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO instantiates variation as a self-directed agent loop that can consult the current lineage, a domain-specific knowledge base, and execution feedback to propose, repair, critique, and verify implementation edits. We evaluate AVO on attention, among the most aggressively optimized kernel targets in AI, on NVIDIA Blackwell (B200) GPUs. Over 7 days of continuous autonomous evolution on multi-head attention, AVO discovers kernels that outperform cuDNN by up to 3.5% and FlashAttention-4 by up to 10.5% across the evaluated configurations. The discovered optimizations transfer readily to grouped-query attention, requiring only 30 minutes of additional autonomous adaptation and yielding gains of up to 7.0% over cuDNN and 9.3% over FlashAttention-4. Together, these results show that agentic variation operators move beyond prior LLM-in-the-loop evolutionary pipelines by elevating the agent from candidate generator to variation operator, and can discover performance-critical micro-architectural optimizations that produce kernels surpassing state-of-the-art expert-engineered attention implementations on today's most advanced GPU hardware.