Bioinformatics solution for clinical utilization of next generation DNA sequencing

Middha, Sumit2014-12-232014-12-232014-09https://hdl.handle.net/11299/168275University of Minnesota Ph.D. dissertation. September 2014. Major: Biomedical Informatics and Computational Biology. Advisor: Dr.Claudia Neuhauser. 1 computer file (PDF); x, 132 pages, appendix A.DNA sequencing as an application of Next Generation Sequencing (NGS) is beginning to reshape how physicians diagnose and make treatment decisions for their patients. These NGS technologies provide a great depth of information by bringing along unprecedented throughput of data, huge scalability and speed. The terabytes of data generated has precipitated a need for efficient bioinformatics analysis and interpretation processes. My dissertation provides an end-to-end solution to analyze DNA sequencing data, interpret and deliver results efficiently and effectively. I developed a modular, robust workflow Targeted RE-sequencing Annotation Tool (TREAT) to provide a backbone for NGS DNA analysis, in collaboration with Mayo Clinic's bioinformatics core [1]. TREAT is one of the first bioinformatics solutions to incorporate alignment, variant calling, annotation and visualization of DNA sequencing data. To better evaluate the increasing foray of NGS into the clinical domain, I designed a module for comprehensive depth of coverage evaluation for genes and variants of interest. This module extending upon the TREAT pipeline helps quantify the applicability of NGS for clinical gene panels [2]. With dwindling costs and increasing availability of whole genome sequencing, turnaround time remains a major factor for clinical adaptation of NGS. I developed a novel iterative bioinformatics approach to expedite whole genome analysis by focusing on clinically relevant genomic regions, reporting results in less than 10% of the original processing time [3]. Further research employing additional clinical annotation has given us insight into a comprehensive genotype phenotype correlation evaluation of clinically reportable variants. Here I report on the characteristics of clinically relevant variants typically expected per individual from whole exome DNA sequencing data. These data highlight challenges that need to be addressed including both phenotype issues of disease penetrance and uncertainty about what is clinically reportable, and sequencing issues like incomplete sequencing coverage, thresholds for data filtering and lack of high quality databases to determine functional annotation.enBiomedical informatics and computational biologyBioinformatics solution for clinical utilization of next generation DNA sequencingThesis or Dissertation