通用多模态蛋白质设计实现化学性质的DNA编码
General Multimodal Protein Design Enables DNA-Encoding of Chemistry
April 6, 2026
作者: Jarrid Rector-Brooks, Théophile Lambert, Marta Skreta, Daniel Roth, Yueming Long, Zi-Qi Li, Xi Zhang, Miruna Cretu, Francesca-Zhoufan Li, Tanvi Ganapathy, Emily Jin, Avishek Joey Bose, Jason Yang, Kirill Neklyudov, Yoshua Bengio, Alexander Tong, Frances H. Arnold, Cheng-Hao Liu
cs.AI
摘要
演化是产生酶多样性的非凡引擎,但其探索的化学反应范围仍远小于DNA所能编码的化学空间。深度生成模型虽能设计结合配体的新蛋白质,但尚未实现在不预设催化残基的情况下创造全新酶。我们提出DISCO(基于扩散的序列结构协同设计模型),这种多模态模型能够围绕任意生物分子协同设计蛋白质序列与三维结构,并引入跨模态优化的推理时缩放方法。仅以反应中间体为条件,DISCO即可设计出具有新颖活性位点几何结构的多样化血红素酶。这些酶能催化自然界未曾发现的卡宾转移反应,包括烯烃环丙烷化、螺环丙烷化、B-H键及C(sp³)-H键插入等,其活性远超工程化酶。对选定设计的随机突变实验进一步证实,通过定向演化可提升酶活性。DISCO为可演化酶提供了可扩展的设计路径,从而拓宽了基因可编码转化的潜在边界。代码详见https://github.com/DISCO-design/DISCO。
English
Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encode. Deep generative models can design new proteins that bind ligands, but none have created enzymes without pre-specifying catalytic residues. We introduce DISCO (DIffusion for Sequence-structure CO-design), a multimodal model that co-designs protein sequence and 3D structure around arbitrary biomolecules, as well as inference-time scaling methods that optimize objectives across both modalities. Conditioned solely on reactive intermediates, DISCO designs diverse heme enzymes with novel active-site geometries. These enzymes catalyze new-to-nature carbene-transfer reactions, including alkene cyclopropanation, spirocyclopropanation, B-H, and C(sp^3)-H insertions, with high activities exceeding those of engineered enzymes. Random mutagenesis of a selected design further confirmed that enzyme activity can be improved through directed evolution. By providing a scalable route to evolvable enzymes, DISCO broadens the potential scope of genetically encodable transformations. Code is available at https://github.com/DISCO-design/DISCO.