AAVGen:用于肾脏选择性靶向的腺相关病毒衣壳精密工程
AAVGen: Precision Engineering of Adeno-associated Viral Capsids for Renal Selective Targeting
February 21, 2026
作者: Mohammadreza Ghaffarzadeh-Esfahani, Yousof Gheisari
cs.AI
摘要
腺相关病毒(AAV)是基因治疗领域极具前景的载体,但其天然血清型存在组织趋向性、免疫逃逸和生产效率方面的局限性。由于序列空间庞大且需同步优化多重功能特性,改造衣壳蛋白面临巨大挑战。这一复杂性在肾脏靶向应用中尤为突出,因其独特的解剖屏障和细胞靶点要求载体工程实现精准高效的定向改造。本文提出AAVGen——一种生成式人工智能框架,可实现具有增强多性状特征的AAV衣壳蛋白从头设计。该框架将蛋白质语言模型(PLM)与监督微调(SFT)及名为群体序列策略优化(GSPO)的强化学习技术相结合,通过基于ESM-2架构的三个回归预测器(分别预测生产适应性、肾脏趋向性和热稳定性)生成复合奖励信号来指导模型优化。实验结果表明,AAVGen能生成多样化的新型VP1蛋白序列库。计算机验证显示大部分生成变体在所有三项评估指标上均表现优异,实现了多目标协同优化。通过AlphaFold3进行的结构分析进一步证实,尽管序列呈现多样性,生成序列仍能保持标准衣壳折叠构象。AAVGen为数据驱动的病毒载体工程奠定了基础,将加速具有定制化功能特性的新一代AAV载体研发进程。
English
Adeno-associated viruses (AAVs) are promising vectors for gene therapy, but their native serotypes face limitations in tissue tropism, immune evasion, and production efficiency. Engineering capsids to overcome these hurdles is challenging due to the vast sequence space and the difficulty of simultaneously optimizing multiple functional properties. The complexity also adds when it comes to the kidney, which presents unique anatomical barriers and cellular targets that require precise and efficient vector engineering. Here, we present AAVGen, a generative artificial intelligence framework for de novo design of AAV capsids with enhanced multi-trait profiles. AAVGen integrates a protein language model (PLM) with supervised fine-tuning (SFT) and a reinforcement learning technique termed Group Sequence Policy Optimization (GSPO). The model is guided by a composite reward signal derived from three ESM-2-based regression predictors, each trained to predict a key property: production fitness, kidney tropism, and thermostability. Our results demonstrate that AAVGen produces a diverse library of novel VP1 protein sequences. In silico validations revealed that the majority of the generated variants have superior performance across all three employed indices, indicating successful multi-objective optimization. Furthermore, structural analysis via AlphaFold3 confirms that the generated sequences preserve the canonical capsid folding despite sequence diversification. AAVGen establishes a foundation for data-driven viral vector engineering, accelerating the development of next-generation AAV vectors with tailored functional characteristics.