Browsing by Subject "de novo assembly"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Identification Of Genetic Variation In Highly Divergent Regions Using Whole Exome Sequencing(2016-12) Tian, ShulanWhole exome sequencing is widely used for identifying disease-associated variants in both clinic and research settings. Using this technology to accurately identify genetic variants is essential, yet major challenges remain in highly divergent but medically important genomic regions. We developed an analytical workflow enabling sensitive and accurate variant discovery for highly divergent genomic regions from whole exome sequencing data. Our workflow combines both mapping- and de novo assembly-based approaches, for which the tools were selected and optimized through extensive evaluation of their performance across different coverage depths and divergence levels, the two key factors profoundly impacting variant detection. We used simulated exome reads for an initial assessment and then public exome data from a well-studied CEPH individual NA12878 for more focused evaluations. Our analysis revealed that the 25 combinations between five mappers and five callers had comparable performance in the non-HLA regions as expected, which have approximately 0.1-0.4% divergence. However, they differed markedly in the HLA region in which different haplotypes can show up to 10-15% divergence. We also evaluated the effect of post-alignment processing and provide a practical guideline regarding the application of local realignment and base quality score recalibration in designing analytical workflows. We transferred our findings into a highly sensitive and computationally efficient workflow for mapping-based variant discovery. It excels in both sensitivity and speed through our two-tier mapping strategy, not only in regions of high divergence but also in lowly divergent regions. To utilize the local phasing information and identify transmitted variants, we also developed a de novo assembly-based variant calling workflow for whole exome data. It performs well over a wide range of coverage depths and divergence levels. In fact, for SNP detection from the HLA region, it is far more superior to all other existing methods based on both simulated and multiple benchmarked exome datasets. Finally, we incorporated the mapping- and de novo assembly-based approaches into a single pipeline, providing the flexibility of variant detection through executing either or both methods. Our pipeline should be particularly useful for WES projects focusing on diseases that are associated with HLA or other highly divergent regions.Item Whole Genome Assembly and Annotation of Northern Wild Rice (Zizania palustris L.), a North American Grain(2021-07-23) Haas, Matthew W; Kono, Thomas; Macchietto, Marissa; Millas, Reneth; McGilp, Lillian; Shao, Mingqin; Duquette, Jacques; Hirsch, Candice N; Kimball, Jennifer A; jkimball@umn.edu; Kimball, Jennifer A; University of Minnesota Cultivated Wild Rice Breeding and Genetics LabNorthern Wild Rice (NWR; Zizania palustris L.) is an aquatic grass native to North America that is notable for its nutritious grain. This is an important species with ecological, cultural, and agricultural significance, specifically in the Great Lakes region of the United States. Using long- and short-range sequencing, Hi-C scaffolding, and RNA-seq data from eight tissues, we generated a whole genome de novo assembly and annotation of NWR. The assembly is 1.29 Gb, highly repetitive (~76.0%), and contains 46,421 protein-coding genes. Comparative analyses revealed conservation of large syntenic blocks with Oryza sativa L., which were used to identify putative seed shattering genes. Estimates of divergence times revealed the Zizania genus diverged from Oryza ~26-30 million years ago (MYA), while NWR and Zizania latifolia diverged from one another ~6-8 MYA. Comparative genomics revealed evidence of a whole genome duplication in NWR ~5.3 MYA after the NWR-Z. latifolia speciation event. This high-quality genome assembly and annotation provides is a valuable resource for comparative genomics in the Oryzeae tribe and provides an important resource for future conservation and breeding efforts of NWR.