Golinski, Alexander2022-11-142022-11-142021-08https://hdl.handle.net/11299/243177University of Minnesota Ph.D. dissertation. 2021. Major: Chemical Engineering. Advisors: Benjamin Hackel, Stefano Martiniani. 1 computer file (PDF); 191 pages.Proteins can be engineered to perform a variety of functions ranging from diagnostics and therapeutics to industrial and commercial enzymes. The ability to computationally evaluate the performance of a protein from its amino acid sequence would increase the efficiency of discovery, expanding the impact of engineered proteins. However, the problem is plagued by the immensity, complexity, and barrenness of the amino acid sequence-function landscape. The following research is focused on predicting two nontraditional protein functions: 1) Evolvability - the ability to generate novel functionality based upon the mutation of a subset of amino acid positions, and 2) Developability - the ability to be efficiently manufactured and maintain primary functionality. Limited prior understanding of these functions was available across broad swaths of sequence space. This work advanced a hybrid experimental/computational platform to provide broad and deep experimental data on sequence-function relationship. Empowered by data analytics, the dataset enabled accurate predictions and provided mechanistic insight regarding protein evolvability and developability. The first story aimed to determine which computable biophysical properties drive evolvability. Utilizing high-throughput screens for evolving specific molecular targeting, the performance of seventeen protein scaffolds were obtained for seven molecular targets. A model predicting evolvability from biophysical properties was trained, with a focus on generalizability and interpretability. Achieving a 4/6 true positive rate, a 9/11 negative predictive value, and a 4/6 positive predictive value, the predictive model analysis suggests a large, disconnected paratope (location of sequence variation) will permit evolved binding function. The second story aimed to generate a model to predict protein developability, as determined by bacterial production, from amino acid sequence. As traditional metrics of developability are often capacity limited (10^2 - 10^3), a set of three of high-throughput (10^5) assays were created to generate a sufficient dataset. The relevance of the assays to traditional metrics was certified by a model that predicts expression from assay performance 35% closer to the experimental variance and trains 80% more efficiently than a model predicting from sequence information alone. The validated assays offer the ability to identify developable proteins at unprecedented scales, reducing a bottleneck of protein commercialization. Neural networks were trained to generate a numeric developability representation (embedding) for each sequence from the high-throughput dataset and transfer the embedding to predict recombinant expression. Mimicking protein theory, our deep-learning model convolves machine-learned amino acid properties to predict expression 42% closer to the experimental variance compared to a traditional approach. Analysis of trained numeric encodings of the amino acids highlights the unique capability of cysteine, the importance of hydrophobicity and charge, and unimportance of aromaticity when aiming to improve developability of the protein scaffold Gp2. The completion of the studies supports the hypothesis that data-driven protein engineering can both accurately predict protein evolvability and developability while also providing meaningful insight into the properties driving functionality. The success of this approach is predicted to increase significantly as the capacity to parametrize protein function continues to grow. The research presents the increased ability to engineer proteins across their diverse sequence landscape using modern experimental techniques and data analytics.enDevelopabilityEmbeddingsEvolvabilityMachine LearningProtein EngineeringProtein ScaffoldData Driven Approach to Engineering Protein Evolvability and DevelopabilityThesis or Dissertation