Browsing by Subject "Genomics"
Now showing 1 - 20 of 21
- Results Per Page
- Sort Options
Item Animals By Design(Minnesota Agricultural Experiment Station, 1998) University of Minnesota. Agricultural Experiment StationItem Computational analysis of genetic interaction network structures and gene properties(2017-07) Koch, ElizabethCellular systems are responsible for many complex tasks, such as carrying out cell cycle phases, responding to intra- and extra-cellular conditions, and resolving errors. Through analysis of biological networks, researchers have begun to describe how cells coordinate these processes by means of modularity and between-process connections. However, descriptions of this network-based cellular organization often do not incorporate the diverse characteristics and individual behaviors of the genes that compose it. Knowledge of gene properties and their relationships with biological network evolution is crucial for a complete understanding of cellular function, and investigation in this area can lead to general principles of biology that apply to many species. This dissertation will describe analyses of the Saccharomyces cerevisiae (baker’s yeast) genetic interaction network that connect gene topological behavior with various physical, functional, and evolutionary properties of genes. Genetic interactions occur between paired genes whose simultaneous mutations produce unexpected double-mutant phenotypes, which are indicative of a range of functional relationships. Because genetic interactions can be identified genome-wide in high-throughput experiments, their networks are comprehensive and unbiased representations of function to which we can apply computational methods that search for structure-function relationships. We begin by exploring the association between a set of gene properties and gene genetic interaction (GI) degree. Here, we build a decision tree model that sorts genes based on a set of properties, each of which has a correlation with GI degree, and accurately predicts GI degree. We show that our model, trained on S. cerevisiae, is also accurate for a very distant yeast species, Schizosaccharomyces pombe, demonstrating that the rules governing gene connectivity are well conserved. Finally, we used predictions from the model to identify gene modules that differ between the two yeast species. Next, we further characterize hub genes through an investigation of pleiotropy, the phenomenon of a single genetic locus with multiple phenotypic effects. Pleiotropy has typically been described by counting organism-level phenotypes, but a characterization based on genetic interactions can capture details about cellular processes that are buffered by the cell and never manifest in single mutant cellular phenotypes. For this analysis, we use frequent item set mining to discover GI modules, which we annotate with high-level processes, and use entropy to measure the functional diversity of each gene’s set of containing modules, thus distinguishing between genes whose functional influence is limited to very few bioprocesses and those whose roles are important for varied cellular functions. We identified a number of gene and protein characteristics that differed between genes with high and low pleiotropy and discuss the implications of these results regarding the nature and evolution of pleiotropy.Item Extant variation in the maize pan-genome(2019-03) Brohammer, AlexThe publication of the B73 maize reference genome assembly in 2009 was a monumental achievement and marked an important milestone in the field of maize genetics. This resource has been pivotal to countless discoveries since its release. One of the most surprising of these discoveries, however has been the finding that many sequences are missing or significantly diverged from the reference genome. This realization has helped spur the generation of alternative maize reference genome assemblies including one for the elite inbred line, PH207. The first chapter in this work provides a detailed historical perspective of the study of structural variation in maize and presents a review of the current understanding of the maize pan-genome. The middle chapter consists of original research using the PH207 reference genome to understand the significance of differential fractionation to the prevalence of structural variation in maize. The third chapter explores the contribution of transposable elements to variation in the maize transcriptome. Together these sections highlight the importance of using multiple maize reference genomes to understand the extraordinary diversity in the maize genome and point towards the need for a nuanced and contextualized understanding of this sequence diversity.Item Genetic analysis and characterization of variegation in hybrid grape populations (Vitis spp.)(2020-09) Olson, JackVariegation is a plant trait defined by “plants which develop patches of different colors in the vegetative parts”, although variegation can express in reproductive parts of plants as well (Kirk and Tilney-Bassett, 1978). Variegation is a common trait found among a wide variety of plant species and has been reported in V. vinifera, sport mutations, and segregating in hybrid grape populations (Reisch and Watson, 1984; Filler et al., 1994; personal observation). Variegation is highly desirable in ornamental plant breeding for its showy colors and patterns, but in grape seedlings, it has been observed to have deleterious effects in the form of shorter and less vigorous plants than their wild-type siblings. Mapping the locus or loci associated with variegation would allow for the development of markers to identify parents that may carry the recessive allele for the trait, allowing for more informative decisions on population size and parental combinations when making crosses (Chapter 2). Three different mapping approaches - bulked segregant analysis (BSA), Genome-wide association study (GWAS), and genetic mapping - were utilized to detect and validate associated loci. A total of 9 hybrid grape populations were utilized in mapping, of which all 9 were used in BSA, 3 (GE1642, GE1703, GE1895) were used in GWAS and 2 (GE1642, GE1703) were used in QTL genetic mapping. BSA detected four highly significant SNP markers on chromosome 14 between the physical positions of 21,425,721 to 21,425,734 Mbp. GWAS identified 24 significantly associated markers on chromosome 14 from 27.1 to 30.1 Mbp in GE1642 and GE1895; however, 9 markers on chromosome 11 from 12.1 to 18.4 Mbp were significantly associated with variegation in GE1703. Genetic mapping of GE1642 and GE1703 mapped the variegation the same regions, which validated the region identified in GWAS. Thus, two major loci on chromosomes 11 and 14 were associated with variegation in separate hybrid grape populations. Candidate genes for variegation were identified in the two locus regions for future studies. The effects of variegation on hybrid grape were examined in a variety of experiments in which it was discovered that variegation resulted in a reduction in photosystem II efficiency; reduced leaf chlorophyll and carotenoid concentration; altered leaf palisade mesophyll structure; and had significant reductions in plant growth-related traits (Chapter 3).Item Genomic and transcriptomic approaches for the advancement of CHO cell bioprocessing(2014-06) Vishwanathan, NanditaRecombinant protein therapeutics have transformed healthcare by paving the way for the treatment of refractory illnesses like cancer and arthritis. Chinese hamster ovary (CHO) cells are the major workhorse for the production of these therapeutics. Striving for continual improvements in the productivity and quality of protein produced in CHO cells, many process enhancements have been successfully implemented. However, many processes are still empirical, and we have little understanding of the mechanisms for these methods. The availability of genomic resources for CHO cells has ushered in a `genomics' era in bioprocessing. Genomic resources can now be employed to understand and improve cell lines and processes to enhance the productivity and quality of protein therapeutics produced by CHO cells. Seeking the development of genomic resources for CHO cells, the Chinese hamster genome and transcriptome were sequenced, assembled and annotated. Such transcriptomic resources can be used to study the inherent transcriptomic variability in CHO cells. The genetic cues identified from the study of the variability in the glycosylation pathway genes opens up several opportunities to manipulate protein quality. The relative expression of isozymes in CHO cells affect metabolic characteristics, which in turn may potentially impact product quality or even process robustness. The comparative study of isozymes can give important clues for cell engineering and process development. The isozyme distribution in CHO cells indicates a very high overall glycolytic rate, insinuating to the possibility of manipulating glycolytic flux for improving processes. Engineering superior metabolism through cell engineering can be used to reduce glycolytic flux in the late stage of the fed batch culture to reduce lactate accumulation. A novel dynamic promoter was used to drive the expression of a fructose transporter selectively in the late stages of the culture. By maintaining adequately low fructose levels in the late stage, the glycolytic flux was reduced significantly to induce lactate consumption. Since lactate accumulation is well accepted to be detrimental to productivity, this phenotype is desired for bioprocessing. In addition to such high productivity processes, high producing cells are also desired. The lengthy process of cell line development transforms non-producing cells to high producers. The molecular changes in this transformation were elucidated by studying the transcriptome of CHO cells during cell line development. We hypothesize that methotrexate treatment not only increases the transgene copy number, but also enriches cells with superior growth, energy metabolism, and secretion capabilities. This leads to an enriched population of high producers. The sustenance of high productivity over several generations depends on the stability of the integration site of the transgene. Two methods for identifying the cell's transgene integration site were developed and optimized. These methods can be applied for high throughput investigation of stability of integration sites.The application of genomics in bioprocessing has sparked a systems approach to investigate genetic regulation. This knowledge paved the way for controlling cellular metabolism and achieve stable and high producing cell lines and processes. Such genome scale analyses have a great potential to advance the capacity of CHO cells for biopharmaceutical applications.Item Genomics and domestication of Field Pennycress (Thlaspi arvense)(2015-05) Dorn, KevinThlaspi arvense (field pennycress) is a cold tolerant oilseed species that is being domesticated as a new rapid cycling, winter annual cover crop and feedstock for biodiesel production. Pennycress is related to Arabidopsis thaliana, a model species that has provided an in-depth understanding of many basic developmental and physiological plant processes, which will provide vital information for the rapid domestication of a wild species into a new crop. By targeting key pennycress traits for improvement, such as reducing seed dormancy, increasing rates of spring flowering and maturity, increasing yield, and modifying seed oil composition, we are poised to develop a new winter cash crop that can fit within the corn/soybean rotation. To enable a mutation breeding approach that utilizes the massive amount of Arabidopsis-based knowledge, genomic resources are needed to identify target genes believed to influence key traits. In this dissertation, the first comprehensive annotated transcriptome assembly and comparative analyses are presented, along with the first draft genome sequence for pennycress. In these analyses, target assembled transcripts and corresponding DNA sequences are identified and compared to Arabidopsis homologs and enable the forward and reverse genetic screening of large scale mutant populations. An analysis of winter and spring annual pennycress accessions is also presented, which identified several wild alleles of the pennycress FLOWERING LOCUS C homolog which was found to be responsible for differentiating between spring and winter annual phenotypes. The resources presented herein will provide an unprecedented set of tools to enable the rapid domestication of a new crop species.Item Haplotype-Based Selection Signature Analysis Using University Of Minnesota And Us Contemporary Holstein Cattle(2015-11) Yang, JingArtificial selection in dairy cattle since 1964 has achieved steady increase in milk production that was accompanied by unintended declines in fertility. We conducted selection signature analysis to identify genome changes due to the forty years of selection using direct comparison of 45,878 SNPs between Holstein cattle unselected since 1964 and contemporary Holsteins. The Holstein genome had a landscape change from the unselected to the elite contemporary Holsteins. About 31% of the genome was affected by the forty years of selection, and 230 regions had highly significant changes in long-range allele frequencies and genotypic heterozygosity. From these 230 regions, 197 genes with documented fertility functions mostly in mice and humans were identified, leading to the hypothesis that the unintended declines in fertility since 1964 was due to hitchhiking of selection by negative effects of fertility genes. The female-male ratio of the 197 fertility genes is approximately 5:4, indicating that the fertility problems in the contemporary Holstein population likely was due to decreased fertility in both females and males. The elite Holsteins were more heterozygous than their contemporaries in all thirty regions where the elite cows and their contemporaries had significant heterozygosity differences, including seven regions in or near large clusters of olfactory receptors, zinc fingers, cationic amino acid transporters, sialic acid-binding Ig-like genes, vomeronasal receptors, keratin genes, EMR2 receptors, and transfer RNA’s.Item Identification and characterization of DNA methylation variation within maize(2013-05) Eichten, Steven RichardDNA methylation is a genetic modification known to repress the activity of transposable elements, repetitive sequences, and in some cases genes. Although DNA methylation is often found in common locations across different individuals, evidence has shown that DNA methylation can vary between individuals at certain loci and can therefore have the opportunity to create a unique regulatory environment for the surrounding sequence. Beyond this, the relationship between DNA methylation state and the genetic content of an individual is still unclear. DNA methylation may act as a downstream effect of certain genetic signals, or it may act independently of genetic state as an epigenetic modification. The goal of this thesis is to profile the DNA methylation landscape across maize (Zea mays) and identify the genomic regions that display differential DNA methylation patterns. These regions of differential methylation are then further studied to understand their stability across generations, their influences on gene expression, as well as their connection to the genetic context they are found. The chapters describe the identification of thousands of differentially methylated regions (DMRs) between maize lines. These DMRs are shown to occur throughout the genome and have high stability across generations. In contrast, few DMRs are found across different tissues within the same genotype. DMRs are shown to often be associated with the local genetic variation. This genetic relationship is highlighted, along with the discovery of a mechanism of genetic control by the spreading of DNA methylation from certain retrotransposable elements. These results indicate that DMRs are present in maize and are created through both epigenetic and genetic means.Item Leveraging Summary Statistics and Integrative Analysis for Prediction and Inference in Genome-Wide Association Studies(2020-07) Pattee, JackGenome-wide association studies (GWASs) have attained substantial success in parsing the genetic etiology of complex traits. GWAS analyses have identified many genetic variants associated with various traits, and polygenic risk scores estimated from GWASs have been used to effectively predict certain clinical phenotypes. Despite these accomplishments, GWASs suffer from some pervasive issues with power and interpretability. To address these issues, we develop powerful and novel approaches for prediction and inference on genetic and genomic data. Our approaches focus on two key elements. First is the incorporation of additional sources of genetic and genomic data. A typical GWAS characterizes the genetic basis of a trait in terms of associations between the trait and a set of single nucleotide polymorphisms (SNPs). This approach can often be underpowered and difficult to understand biologically. We can often increase power and interpretability by effectively incorporating other sources of genetic and genomic data into the single SNP analysis structure. Second is the development of methods that are widely applicable in the context of summary statistics. Many published GWAS analyses do not provide so-called individual level genetic and genomic data, and instead provide only summary statistic information. Given this, we want our methods to be able to be flexible in the context of summary statistics without the need for individual level information. We first develop a novel approach to integrating somatic and germline information from tumors to identify genes associated with lung cancer risk. We leverage this approach to discover potentially novel genes associated with lung cancer. We then investigate the problem of estimating powerful and parsimonious models for polygenic risk scores in the context of summary statistics. We develop a set of novel methods for model estimation, model selection, and the assessment of model performance, and demonstrate their beneficial properties in extensive simulation and in application to GWASs of lung cancer, blood lipid levels, and height. Lastly, we integrate our methods for polygenic risk score estimation into a two sample two-stage least squares analysis framework to identify potentially novel endophenotypes associated with increased risk of Alzheimer's disease. We demonstrate via simulation and real data application that our approach is powerful and effective.Item Maize bisulfite coupled sequence capture (SeqCap-Epi-v2) probe design(2017-12-13) Springer, Nathan M; Li, Qing; Crisp, Peter A; pcrisp@umn.edu; Crisp, Peter A; Springer LabTable detailing the design and genomic coordinates (v2 and v4) for the maize bisulfite coupled sequence capture (SeqCap-Epi-v2) probe design for profiling of the maize methylome. Released to accompany our publication on DNA methylation changes induced by tissue culture (Han et al 2017) as a supporting supplemental methods file.Item Network-based mixture models for genomic data.(2009-06) Wei, PengA common task in genomic studies is to identify genes satisfying certain conditions, such as differentially expressed genes between normal and tumor tissues or regulatory target genes of a transcription factor (TF). Standard approaches treat all the genes identically and independently a priori and ignore the fact that genes work coordinately in biological processes as dictated by gene networks, leading to inefficient analysis and reduced power. We propose incorporating gene network information as prior biological knowledge into statistical modeling of genomic data to maximize the power for biological discoveries. We propose a spatially correlated mixture model based on the use of latent Gaussian Markov random fields (GMRF) to smooth gene specific prior probabilities in a mixture model over a network, assuming that neighboring genes in a network are functionally more similar to each other. In addition, we propose a Bayesian implementation of a discrete Markov random field (DMRF)-based mixture model for incorporating gene network information, and compare its performance with that based on Gaussian Markov random fields. We also extend the network-based mixture models to ones that are able to integrate multiple gene networks and diverse types of genomic data, such as protein- DNA binding, gene expression and DNA sequence data, to accurately identify regulatory target genes of a TF. Applications to high-throughput microarray data, along with simulations, demonstrate the utility of the new methods and the statistical efficiency gains over other methods.Item Non-volatile In-memory Computing for Large Scale Data-Intensive Workloads: Challenges and Opportunities(2021-12) Chowdhury, ZamshedThe application(domain)s that depend on the large amount of data for solving problems, e.g., genome sequence analysis, graph analytics, machine learning etc., suffer from growing overhead of data communication between physically separate logic (i.e., compute) and memory elements in conventional von Neumann computing. The recent progress in processing(/computing)-in-memory (PIM/CIM) or simply, in-memory computing addresses data communication overhead in these applications by fusing compute capability with memory where the data reside– thereby achieving reduced energy consumption, and higher application throughput due to access to the higher internal bandwidth of the memory substrate as compared to the off-chip bandwidth.In this thesis, we focus on the architecture- and application-level characterizations of PIM architecture, Computational RAM (CRAM) in particular, for large scale data-intensive workloads–in terms of opportunities and challenges. We demonstrate the efficacy of CRAM in reducing the communication bottleneck of genomic sequence analysis, as a representative application domain due to its importance and inherent characteristics that are suitable for PIM-based implementation, by designing various CRAM-based Hardware (HW) accelerators. The designs cover all architectural aspects such as data layout, spatio-temporal scheduling of compute, system integration etc. First, we introduce an in-memory accelerator architecture, BWA-CRAM, for DNA sequence alignment by direct mapping of state-of-the-art Burrows–Wheeler Aligner algorithm on CRAM. This architecture outperforms corresponding software implementation in terms of throughput and energy efficiency, even under conservative assumptions. Next, we improve the performance of DNA sequence (pre-)alignment (and other similar, generic pattern matching applications) through HW/SW co-design and introduce SpinPM, a novel high-density, reconfigurable spintronic in-memory pattern matching substrate based on CRAM with Spin-Orbit-Torque (SOT)– specifically Spin-Hall-Effect (SHE) MTJ devices; and demonstrate the performance benefit SpinPM can achieve over conventional and near-memory processing systems. Subsequently, we present CRAM-Seq, an accelerator for RNA-Seq abundance quantification based on CRAM. Through HW/SW co-design, we demonstrate that CRAM-Seq outperforms a commonly used state-of-the-art software abundance quantification algorithm, Kallisto, in terms of throughput and energy efficiency. We introduce Content Addressable Memory or CAM, which is very efficient in large scale pattern matching, functionality in CRAM, next. We present CAMeleon- a novel compute substrate that leverages the high energy efficiency benefit of CRAM, and is capable of satisfying very stringent hardware resource (area) budget in embedded/edge computing applications, e.g., handheld sequencing device. CAMeleon performs CAM operations more energy-efficiently while consuming less/similar area, and supports logic and memory functions beyond CAM operations on demand through reconfiguration, as compared to conventional CAM-only designs based on SRAM and emerging memory technologies (such as STT-MTJ, ReRAM and PCM). Finally, we study the impact on applications’ reliability due to mapping on a PIM substrate, focusing on PIM architectures that perform logic operations directly within memory arrays, in-situ, obviating any need for data transfers (even to and from the array periphery), e.g., CRAM. Here we (i) quantitatively characterize gate–flip errors, an acute class of functional errors specific to such PIM systems, where, due to parametric variations, a logic gate can behave as another; and (ii) analyze to what extent algorithmic noise tolerance can mask gate-flips.Item Pharmacogenomic modeling of bortezomib resistance in B cell malignancies(2013-04) Stessman, Holly Annette-FeserProteasome inhibitors are a class of drugs that have been largely successful in the treatment of cancer patients, particularly those with the plasma cell malignancy, multiple myeloma. The most successful of these drugs, bortezomib (Bz), has paved the way for the development of next-generation proteasome inhibitors. Although Bz has significantly contributed to improved outcomes in myeloma patients, acquired resistance to Bz is imminent. Furthermore, a portion of patients never initially respond to the drug. Therefore, the goal of these studies was to further characterize Bz resistance with the aim to better predict secondary therapies that may be used successfully with Bz to recapture drug sensitivity.In the first study, we describe the creation of an in vitro malignant mouse plasma cell system from which we create isogenic pairs of Bz-sensitive and -resistant cell lines. We further characterize the transcriptional responses of these cell line pairs to identify both conserved and unique expression signatures. Using the expression signatures that are unique to each pair of cell lines, we identify secondary therapies that may be useful for treatment of the Bz-refractory cell line using an in silico database called Connectivity Map (CMAP). This analysis predicted a unique response to histone deacetylase inhibitors, a class of drugs that are currently being tested for efficacy in myeloma, in only one mouse cell line pair. Indeed, we find that the predicted Bz-resistant cell line has increased sensitivity to this class of drugs (including the drug panobinostat). When these cells were transferred back into syngeneic recipient mice, panobinostat treatment could successfully extend the life of Bz-resistant animals suggesting that the Bz-resistant phenotype may select also for increased sensitivity to other drugs that may be identified through in silico approaches. In the second study, we follow up these observations by investigating other CMAP prediction patterns, such as those that are conserved across all cell line pairs. A second prediction of one class of these CMAP-predicted drugs using high-throughput drug screening of the cell lines revealed that a combination of these approaches may be highly successful for accurate prediction of secondary therapies. Based on these predictions, we further investigate the efficacy of topoisomerase inhibitors in combination with Bz for the treatment of Bz-resistant cell lines.In the third study, we provide further immunophenotypic characterization of the Bz-sensitive and -resistant mouse cell lines revealing not only cell surface markers that are associated with "acquired" and "innate" Bz resistance but perhaps a mechanism of resistance. Although Bz-sensitive mouse cells display a classic myeloma phenotype, homing to the bone marrow in vivo and expressing classic plasma cell markers, Bz-resistant mouse cells present as extramedullary disease and express a more B cell-like immunophenotype. We identify that differences in migration may be linked to the differential expression of the bone marrow homing protein, CXCR4. Lower expression of this gene in a Bz human clinical trial was also associated with inferior survival. Immunophenotypic characterization of these cell populations further revealed that forced differentiation of the Bz-resistant population could restore Bz-sensitivity.The final study investigates the acquisition of Bz-resistance in a B cell malignancy, Burkitt lymphoma, that is currently undergoing Bz clinical trials. In this particular malignancy, a DNA mutator, AID, is known to be expressed that may contribute to other types of drug resistance. Here, we identify that this is unlikely a mechanism for developing resistance to Bz. Furthermore, we provide evidence that AID activity is reduced in Bz-resistant clones and, in fact, that high AID expression may be selectively eliminated during Bz selection.Item Population genomics of the legume symbionts Sinorhizobium meliloti and S. medicae(2013-11) Epstein, BrendanThe nitrogen-fixing mutualism between legumes and rhizobia is ecologically and agriculturally important and is a model for the molecular genetics of plant microbe interactions and the evolution of mutualism. The goal of this research was to investigate the evolutionary forces shaping genetic diversity in two species of rhizobia, Sinorhizobium meliloti and S. medicae by integrating population genetic tools, experimental evolution, and whole-genome sequencing. In Chapter 1, I characterize the diversity and divergence of S. meliloti and S. medicae and ask how selection and horizontal gene transfer (HGT) have shaped nucleotide variation. I find limited evidence for HGT between S. meliloti and S. medicae, indicating that recombination with closely related species does not have much impact on nucleotide diversity in Sinorhizobium spp. and does not prevent species from diverging. I also find that the targets of strong positive selection are different in the two species, suggesting that S. meliloti and S. medicae may be subject to different selective pressures in nature. The goal of Chapter 2 was to examine gene content and copy number variation. While I find that S. meliloti and S. medicae both have extensive variation in content and copy number, most of this variation seems to be deleterious. This suggests that the large size of bacterial pangenomes is due, in part, to many short-lived, deleterious gains and losses of genes rather than adaptation. Finally, in Chapter 3, I use experimental evolution to tests for costs of mutualism. I do not find clear evidence of costs, but I do identify a mutation in the purM gene that may affect host range. Overall, this contributes to our understanding of both the evolution of rhizobia, and the evolutionary forces shaping variation in prokaryotes.Item Reciprocal Informants: Using Fungal Bioinformatics, Genomics, and Ecology to tie Mechanisms to Ecosystems(2019-08) Lofgren, LotusAcross both wild and human-structured ecosystems, fungi interact with every plant species on earth. From mycorrhizal mutualisms, harmless endophytes, and deadly pathogens, the results of these interactions can mean the difference between a plant’s ability to grow and flourish, or languish and expire. Fungal-host dynamics are not static traits, either over evolutionarily time or during the lifetime of individuals where ecological context dependency shapes the outcomes of fungal-host interactions. Understanding the ecological and genetic factors that structure plant-fungal relationships has wide ranging consequences for ecosystems, agro-ecosystems, and human health. However, it’s not well understood how complex genetic mechanisms and ecological pressures work in concert to structure the outcomes of fungal-host interactions, particularly among fungal mutualists. This dissertation contributes to this understanding by investigating how fungal-host relationships are regulated at two levels: broadly, investigating the ecology of fungal-host systems, and specifically, investigating the genetic and genomic basis of how these interactions are mediated. I begin Chapter 1 from the perspective of fungal ecology, investigating the influence of neighborhood (the surrounding plant community) on host specificity patterns using the host-specialist ectomycorrhizal (ECM) genus Suillus. The number of host species that a given fungal species will associate with, and how closely related these host species are, is the study of fungal host specificity. While some fungi associate with only a single species of host (high host specificity), most associate with tens or hundreds of host species (low host specificity). Fungi in the genus Suillus are famous for their high host specificity, primarily associating with plants in the family Pineaceae (particularly White Pines, Red Pines and Larchs). Using a combination of field sampling, sequencing, and colonization bioassays, I present evidence that one species, S. subaureus, has undergone a novel host-expansion onto Angiosperms, and argue that neighborhood effects influence ECM colonization outcomes over both space and time. In Chapter 2, I expand from fungal ecology into fungal genomes. Using genome mining and comparative genomics, I look for signatures of ECM host specificity using 19 genome sequenced Suillus species in relation to 1) other (non-Suillus) ECM fungi and 2) an intrageneric comparison between Suillus that specialize on Red Pine, White Pine or Larch. I present evidence for the involvement of several molecular classes in regulating Suillus host specificity including species specific small secreted proteins, G-protein coupled receptors, and terpene secondary metabolites. Finally, in Chapter 3, I use the genomic and bioinformatic tool sets developed in Chapters 1 and 2, to expand my analysis across the fungal phylogeny and ask questions about a potential molecular correlate of fungal guild and trophic mode: ribosomal DNA (rDNA) copy number. To do this, I developed a bioinformatic pipeline to estimate rDNA copy number variation from whole genome sequence data, and applied it to a phylogenetically and ecologically diverse set of 91 fungal genomes. I present evidence that rDNA copy number is inversely associated phylogenetic distance, but displays a high level of variation, spanning an order of magnitude in Suillus alone, with no detectable correlation to guild occupation or genome size. Taken together, the work presented here shows that genomic and bioinformatic approaches used in concert with classical ecological methodologies, offer great potential to expand our understanding of the two-way influence of ecosystem-level processes and gene-level mechanisms in structuring plant-fungal interactions.Item Regions of High Confidence in Chinese Hamster and CHO-K1 Genome Assemblies(2016-04-20) Vishwanathan, Nandita; Bandyopadhyay, Arpan; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A. T.; Jacob, Nitya M.; Le, Huong; Karypis, George; Hu, Wei-Shou; wshu@umn.edu; Hu, Wei-ShouChinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of the genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. This dataset includes two genome annotation files that identify the 'high confidence regions' shared by the genome assemblies in comparison. The potential use of these files are to find locations in the publically available genome which are likely to be assembled correctly. These regions can be used confidently for genome engineering.Item The Road From Variants To Traits: How Regulatory Variants Affect Gene Expression & Organismal Phenotypes(2024-03) Renganaath, KaushikNature hosts an incredible amount of diversity and beneath such diversity lies fascinating genetics that we have spent years trying to decode. Differences in our DNA sequences lead to variation in organismal traits. Most of these variants have been found to reside in noncoding portions of the genome, implying that a lot of organismal trait variation arises from variation in gene expression levels. Advances in sequencing technology have over the years allowed us to map hundreds of genomic loci underlying gene expression variation, and these loci are called expression quantitative trait loci (eQTLs). These eQTLs are of two types, local and trans, depending on their proximity to the genes they regulate. Local eQTLs regulate expression of genes in close genomic proximity while trans eQTLs regulate distant genes. Today, we possess a vast catalog of eQTLs across multiple taxa. Yet, we don’t fully understand the mechanisms by which eQTLs affect organismal traits. In this dissertation, I computationally dissect the mechanisms connecting genetic variation, gene expression and organismal traits in yeast Saccharomyces cerevisiae. As the first eukaryotic organism to have its genome fully sequenced, S.cerevisiae has over the years been a workhorse for understanding the genetics underlying complex traits. We today have comprehensive sets of QTLs underlying traits like gene expression and growth in yeast that account for most of heritable variation in these traits, allowing us to investigate the mechanisms by which eQTLs lead to organismal trait variation. In this dissertation, I characterize causal variants underlying local eQTLs in yeast (Chapter II) and the mechanisms by which eQTLs influence growth in different conditions (Chapter III). My work unravels fundamental principles by which eQTLs influence complex organismal traits.Item SAM Filtering Pipeline (SFP): Algorithm for the determination of integration sites from next generation sequencing data(2019-07-16) O'Brien, Sofie A; Hu, Wei-Shou; acre@umn.edu; Hu, Wei-ShouThe locus at which a vector harboring a product transgene integrates into the genome can have a profound effect on the transgene’s transcript level and the stability of the resulting cell line. In order to identify integration site(s) of a transfected vector from next generation genome sequencing data, the SAM filtering pipeline (SFP) was created. It is best suited for targeted sequence data, such as that from sequence capture of probed vector regions. However, it will also work for whole genome sequencing data, though the memory requirements are large (the more reads in your data set, the larger the memory requirements). A bwa-mem mapped .sam file is required as input to the pipeline.Item Single Nucleotide Polymorphism Calls for 49 Giant Pandas(2015-09-24) Garbe, John R; Da, Yang; jgarbe@umn.edu; Garbe, JohnThis dataset contains 150,025 single nucleotide polymorphism (SNP) calls for 49 giant pandas. The SNPs are called from whole-genome shotgun sequencing data. The data is useful for studying the population genetics of pandas and are released in tandem with a paper describing conclusions drawn from this dataset.Item Systems analysis of complex biological data for bioprocess enhancement.(2008-12) Charaniya, Salim PyaraliRecent advances in data-driven knowledge discovery approaches, such as `omics' technologies, provide enormous opportunities to uncover the multifarious determinants of several pharmaceutically relevant biological traits. This work focuses on the challenges, which include: (i) Deciphering the regulation of antibiotic production in Streptomyces coelicolor, and (ii) Elucidating the attributes of high recombinant protein productivity in mammalian cell culture processes. The phenotypic complexity of Streptomycetes, which produce several clinically relevant antibiotics and other natural products, manifests in their diversity of secondary metabolism and morphological differentiation. To identify the dynamic gene regulatory networks that confer such complex phenotypes, the temporal transcriptomic characteristics of the model organism S. coelicolor, under more than twenty-five diverse genetic and environmental perturbations, were integrated with other functional and genomic features. A whole-genome operon map was also predicted, and a significant portion of the map was experimentally verified. Such a systems approach can reveal several insights about the functional processes relevant for antibiotics production. The therapeutic value of recombinant proteins has brought about a continuously rising demand that is met by development of hyper-producing mammalian cell lines. However, the molecular ingredients of high productivity are not well understood. The transcriptomes of several recombinant antibody-producing NS0 cell lines with a wide productivity range were surveyed in an attempt to identify the physiological functions that are modulated in high-producing cells. Cell culture process enhancement also entails an understanding of the process parameters and their interactions, which are critical determinants of high recombinant protein productivity. The comprehensive process archives of modern production plants present vast, underutilized resources containing information that, if unearthed, can enhance process robustness. The on-line and off-line process data of several production `trains' from a commercial manufacturing facility were investigated using kernel-based machine learning tools to elucidate predictive correlations between process parameters and the outcome. Together, such discovery strategies based on integrative data mining hold immense potential for enhancing our understanding of industrially relevant biological processes.