Liu, Chaochih2024-01-052024-01-052023-06https://hdl.handle.net/11299/259763University of Minnesota Ph.D. dissertation. June 2023. Major: Plant and Microbial Biology. Advisors: Peter Morrell, Suzanne McGaugh. 1 computer file (PDF); ix, 122 pages + 4 supplementary tables.As sequencing becomes more and more affordable, data continues to grow exponentially, and dataset sizes utilized in biological studies continue to increase. There is an increasing importance for reproducible research, especially studies that rely heavily on computational analyses. Custom code written for study-specific goals that are well documented and hosted in source code repositories with version control can accelerate future studies with similar data processing or analysis steps. For example, updating the reference genome positions of old genotyping SNPs as new reference genome versions get released is just as important as handling the latest long-read sequencing technologies. In this dissertation, I present multiple computational solutions to address problems relevant to crop improvement. In Chapter 1 of this dissertation, I use whole-genome sequencing data from 11 barley lines derived from sodium azide mutagenesis to characterize the nature of mutations induced by sodium azide to understand the nature of variants that reduce fitness. In this work, careful variant filtering was performed to identify variants generated by the mutagen. In Chapter 2, I investigate 318 Wild Barley Diversity Collection accessions for evidence of introgression from 2,446 domesticated barley. Information on wild accessions showing crop-to-wild introgression can be used to make informed decisions on wild samples to include in downstream applications. Finally, in Chapter 3, I develop a reproducible computational workflow that automates the scoring of crossovers as a phenotype to provide a means to quickly evaluate the amount of crossover rate variation present in any biparental population. Output from the computational workflow can be used to address limitations imposed by linked selection in breeding populations. For all the chapters, all code used for data processing and analyses is stored in public GitHub repositories to speed up the advancement of future research built upon this work.enBarleyComputational pipelineIntrogressionRecombinationSodium azide mutagenesisBiological problem solving through computationThesis or Dissertation