Protein Design with Guided Discrete Diffusion

Tuesday November 14th, 4-5pm EST | Samuel Stanton, PhD (Prescient Design) + Nate Gruver (PhD student, NYU)

Summary: A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to several therapeutic targets under locality and developability constraints, attaining a 99% expression rate and 40% binding rate in exploratory in vitro experiments.

Preprint: https://arxiv.org/abs/2305.20009

Sam’s Website: https://samuelstanton.github.io/

Nate’s website: https://ngruver.github.io/

 

Samuel Stanton is a machine learning scientist at Genentech, working with the Prescient Design team on ML-driven drug discovery. His work ranges from fundamental research on uncertainty quantification and decision-making with ML to research on applications like antibody engineering. Prior to joining Genentech Samuel received his PhD from the NYU Center for Data Science, where he worked with Dr. Andrew Gordon Wilson. 

Nate Gruver is a PhD student at NYU, where he works on deep learning with a focus in generative models for biology and chemistry. He is advised by Andrew Gordon Wilson and works closely with Kyunghyun Cho. Nate received his BS and MS in computer science from Stanford University where he worked with Stefano Ermon and was advised by Chris Piech.