Engineering highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening

Tuesday July 9th, 4-5pm EST | Neil Thomas, PhD (EvoScale) and David Belanger, PhD (DeepMind)

Abstract: Optimizing enzymes to function in novel chemical environments is a central goal of synthetic biology, but optimization is often hindered by a rugged, expansive protein search space and costly experiments. In this work, we present TeleProt, an ML framework that blends evolutionary and experimental data to design diverse protein variant libraries, and employ it to improve the catalytic activity of a nuclease enzyme that degrades biofilms that accumulate on chronic wounds. After multiple rounds of high-throughput experiments using both TeleProt and standard directed evolution (DE) approaches in parallel, we find that our approach found a significantly better top-performing enzyme variant than DE, had a better hit rate at finding diverse, high-activity variants, and was even able to design a high-performance initial library using no prior experimental data. We have released a dataset of 55K nuclease variants, one of the most extensive genotype-phenotype enzyme activity landscapes to date, to drive further progress in ML-guided design.

Preprint: https://www.biorxiv.org/content/10.1101/2024.03.21.585615v7

 

Neil is a research scientist at EvolutionaryScale developing AI tools for protein design. He previously worked at Google X where he designed proteins by combining machine learning with high-throughput screening. He completed his PhD in Computer Science at UC Berkeley, where his research focused on developing and evaluating methods for protein representation learning, especially protein language models.

David Belanger is a research scientist at Google Deepmind using machine learning to help understand the function of natural proteins and to engineer novel proteins with desirable properties. He received his PhD from UMass Amherst on machine learning methods for structured prediction in NLP.