Use of Machine Learning to Predict the Desiccation Tolerance of Bacteria
2021-08
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Use of Machine Learning to Predict the Desiccation Tolerance of Bacteria
Authors
Published Date
2021-08
Publisher
Type
Thesis or Dissertation
Abstract
For efficient long-term storage and use of bacteria for environmental applications, understanding and identifying desiccation resistance in bacteria is key. In the past, desiccation tolerance was a common way of characterizing bacteria, so there is much data on the desiccation tolerance of a wide range of bacterial species. Since the advent of transcriptomics, multiple papers have been published on the expression level of genes during desiccation stress. Additionally, many reviews have described mechanisms and genes relevant to desiccation tolerance in bacteria, but an overarching framework for the prediction of desiccation survival in bacteria is lacking. Model building based on data collected from the literature has been used to successfully predict aerobic vs anaerobic phenotype, enzyme function and substrate specificity (Robinson et al., 2020; Jabłońska et al, 2019) Building on this wealth of previous research, machine learning was used to create a robust model that predicts desiccation tolerance given bacterial genomes. Validation and accuracy of the machine learning model was tested using a desiccation assay carried out over three months. To build the model, a literature review was conducted to find genes that were upregulated greater than two-fold during desiccation stress in bacteria. From the review, 2609 genes from 11 papers were found and condensed to 1082 non-homologous and non near-zero variance genes. A second literature search was conducted to identify bacterial species with a known desiccation response, either tolerant or sensitive, and a publicly available genome. Thirty-five desiccation tolerant and 33 desiccation sensitive genomes were chosen and then queried for the previously curated desiccation upregulated genes list. Approximately 176,800 genes were analyzed, and genes with non-zero variance were removed. The remaining 75,982 genes are included in the model (Rogozin et al., 2002). A random forest supervised machine learning approach was used to create a preliminary model for desiccation resistance. The genomes were split into 80% training data and 20% test data and the model was run 100 times with different seeds, 10-fold cross validation, and three repeats. The average accuracy for the 100 iterations of the model was 0.898 ± 0.0266, indicating the model could accurately predict the desiccation phenotype of the testing data 89.8% of the time. The experimental validation of the desiccation model looked at the viability of 28 bacteria, seven with documented desiccation phenotypes and 21 bacteria with no known desiccation phenotype. For all organisms tested the model had an accuracy of 0.75 demonstrating good model performance.
Description
University of Minnesota M.S. thesis. 2021. Major: Microbial Engineering. Advisors: Lawrence Wackett, Alptekin Aksan. 1 computer file (PDF); viii, 69 pages.
Related to
Replaces
License
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Clipsham, Maia. (2021). Use of Machine Learning to Predict the Desiccation Tolerance of Bacteria. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/224921.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.