Data Driven Approach to Engineering Protein Evolvability and Developability

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Data Driven Approach to Engineering Protein Evolvability and Developability

Published Date

2021-08

Publisher

Type

Thesis or Dissertation

Abstract

Proteins can be engineered to perform a variety of functions ranging from diagnostics and therapeutics to industrial and commercial enzymes. The ability to computationally evaluate the performance of a protein from its amino acid sequence would increase the efficiency of discovery, expanding the impact of engineered proteins. However, the problem is plagued by the immensity, complexity, and barrenness of the amino acid sequence-function landscape. The following research is focused on predicting two nontraditional protein functions: 1) Evolvability - the ability to generate novel functionality based upon the mutation of a subset of amino acid positions, and 2) Developability - the ability to be efficiently manufactured and maintain primary functionality. Limited prior understanding of these functions was available across broad swaths of sequence space. This work advanced a hybrid experimental/computational platform to provide broad and deep experimental data on sequence-function relationship. Empowered by data analytics, the dataset enabled accurate predictions and provided mechanistic insight regarding protein evolvability and developability. The first story aimed to determine which computable biophysical properties drive evolvability. Utilizing high-throughput screens for evolving specific molecular targeting, the performance of seventeen protein scaffolds were obtained for seven molecular targets. A model predicting evolvability from biophysical properties was trained, with a focus on generalizability and interpretability. Achieving a 4/6 true positive rate, a 9/11 negative predictive value, and a 4/6 positive predictive value, the predictive model analysis suggests a large, disconnected paratope (location of sequence variation) will permit evolved binding function. The second story aimed to generate a model to predict protein developability, as determined by bacterial production, from amino acid sequence. As traditional metrics of developability are often capacity limited (10^2 - 10^3), a set of three of high-throughput (10^5) assays were created to generate a sufficient dataset. The relevance of the assays to traditional metrics was certified by a model that predicts expression from assay performance 35% closer to the experimental variance and trains 80% more efficiently than a model predicting from sequence information alone. The validated assays offer the ability to identify developable proteins at unprecedented scales, reducing a bottleneck of protein commercialization. Neural networks were trained to generate a numeric developability representation (embedding) for each sequence from the high-throughput dataset and transfer the embedding to predict recombinant expression. Mimicking protein theory, our deep-learning model convolves machine-learned amino acid properties to predict expression 42% closer to the experimental variance compared to a traditional approach. Analysis of trained numeric encodings of the amino acids highlights the unique capability of cysteine, the importance of hydrophobicity and charge, and unimportance of aromaticity when aiming to improve developability of the protein scaffold Gp2. The completion of the studies supports the hypothesis that data-driven protein engineering can both accurately predict protein evolvability and developability while also providing meaningful insight into the properties driving functionality. The success of this approach is predicted to increase significantly as the capacity to parametrize protein function continues to grow. The research presents the increased ability to engineer proteins across their diverse sequence landscape using modern experimental techniques and data analytics.

Description

University of Minnesota Ph.D. dissertation. 2021. Major: Chemical Engineering. Advisors: Benjamin Hackel, Stefano Martiniani. 1 computer file (PDF); 191 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Golinski, Alexander. (2021). Data Driven Approach to Engineering Protein Evolvability and Developability. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/243177.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.