Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences

Tuesday May 7th, 4-5pm EST | Jeff Ruffolo, PhD and Stephen Nayfach, PhD

Abstract: Gene editing has the potential to solve fundamental challenges in agriculture, biotechnology, and human health. CRISPR-based gene editors derived from microbes, while powerful, often show significant functional tradeoffs when ported into non-native environments, such as human cells. Artificial intelligence (AI) enabled design provides a powerful alternative with potential to bypass evolutionary constraints and generate editors with optimal properties. Here, using large language models (LLMs) trained on biological diversity at scale, we demonstrate the first successful precision editing of the human genome with a programmable gene editor designed with AI. To achieve this goal, we curated a dataset of over one million CRISPR operons through systematic mining of 26 terabases of assembled genomes and meta-genomes. We demonstrate the capacity of our models by generating 4.8x the number of protein clusters across CRISPR-Cas families found in nature and tailoring single-guide RNA sequences for Cas9-like effector proteins. We experimentally tested 209 of our novel Cas9-like proteins in human cells with a hit-rate of 63%. Our most performant gene editor, denoted as OpenCRISPR-1, demonstrates comparable activity and markedly improved specificity relative to SpCas9, the prototypical gene editor, while being >400 mutations away in sequence. Our results highlight the power of LLMs for generating novel proteins with complex functional activities beyond the scope of structure-based design methods.

Preprint: https://www.biorxiv.org/content/10.1101/2024.04.22.590591v1

 

Jeff Ruffolo is a Machine Learning Scientist at Profluent Bio, where he develops machine learning methods for functional protein design. He obtained his PhD in biophysics at Johns Hopkins University, where he worked in the lab of Jeffrey Gray. During this time, he developed deep learning tools for antibody structure prediction, language modeling, and representation learning. At Profluent, he has contributed to the OpenCRISPR initiative and led the development of next-generation protein language models with atomistic control for function protein generation. 

Stephen Nayfach leads Bioinformatics at Profluent Bio. He was formerly a Research Scientist at the Joint Genome Institute and received his PhD in Biomedical Informatics from UCSF. His past research focused on mining metagenomes and building software tools to understand the evolution of microbial communities and to uncover the hidden genetic diversity of microbes from the biosphere.