基於智能體的神經架構發現:AIRA-Compose 與 AIRA-Design
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
May 15, 2026
作者: Alberto Pepe, Chien-Yu Lin, Despoina Magka, Bilge Acun, Yannan Nellie Wu, Anton Protopopov, Carole-Jean Wu, Yoram Bachrach
cs.AI
摘要
針對遞迴自我改進,我們研究能自主設計超越標準Transformer架構基礎模型的大語言模型代理。我們引入雙框架方法:AIRA-組合(AIRA-Compose)用於高層級架構搜尋,以及AIRA-設計(AIRA-Design)用於低層級機制實作。AIRA-組合運用11個代理,在24小時預算內探索基本計算原語。代理評估百萬參數候選方案,並將頂尖設計外推至3.5億、10億與30億參數規模,最終產出兩個家族共14種架構:AIRAformer(基於Transformer)與AIRAhybrid(Transformer-Mamba混合)。這些架構在10億參數規模下預訓練後,持續優於Llama 3.2及Composer所建立的基準。在下游任務中,AIRAformer-D與AIRAhybrid-D的準確率分別較Llama 3.2提升2.4%與3.8%。此外,AIRA-組合發現具備高效擴展曲線的模型:AIRAformer-C擴展速度較Llama 3.2及Composer最佳Transformer快54%與71%,而AIRAhybrid-C擴展速度則較Nemotron-2快23%,並較Composer最佳混合模型快37%。AIRA-設計則指派20個代理編寫新穎的注意力機制,以處理長程依賴關係及產出高效能訓練腳本。在Long Range Arena基準測試中,代理設計的架構在文檔匹配與文本分類任務上,分別達到僅差人類最佳水準2.3%與2.6%的表現。在Autoresearch基準測試中,Greedy Opus 4.5於固定時間預算下達到0.968的驗證位元組位元率,超越已發表的極小值。綜合而言,這些框架證明了AI代理能自主發現與手動設計基準相當或更優的架構與演算法最佳化。此成果為發現下一代基礎模型建立了強效範例,標誌著邁向遞迴自我改進的明確一步。
English
Toward recursive self-improvement, we investigate LLM agents autonomously designing foundation models beyond standard Transformers. We introduce a dual-framework approach: AIRA-Compose for high-level architecture search, and AIRA-Design for low-level mechanistic implementation. AIRA-Compose uses 11 agents to explore fundamental computational primitives under a 24-hour budget. Agents evaluate million-parameter candidates, extrapolating top designs to 350M, 1B, and 3B scales. This yields 14 architectures across two families: AIRAformers (Transformer-based) and AIRAhybrids (Transformer-Mamba). Pre-trained at 1B scale, these consistently outperform Llama 3.2 and Composer-found baselines. On downstream tasks, AIRAformer-D and AIRAhybrid-D improve accuracy by 2.4% and 3.8% over Llama 3.2. Furthermore, AIRA-Compose finds models with highly efficient scaling frontiers: AIRAformer-C scales 54% and 71% faster than Llama 3.2 and Composer's best Transformer, while AIRAhybrid-C outscales Nemotron-2 by 23% and Composer's best hybrid by 37%. AIRA-Design tasks 20 agents with writing novel attention mechanisms for long-range dependencies and high-performing training scripts. On the Long Range Arena benchmark, agent-designed architectures reach within 2.3% and 2.6% of human state-of-the-art on document matching and text classification. On the Autoresearch benchmark, Greedy Opus 4.5 achieves 0.968 validation bits-per-byte under a fixed time budget, surpassing the published minimum. Together, these frameworks show AI agents can autonomously discover architectures and algorithmic optimizations matching or surpassing hand-designed baselines. This establishes a powerful paradigm for discovering next-generation foundation models, marking a clear step toward recursive self-improvement.