Browsing by Subject "PCA SVD"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Computational analysis and visualization of the evolution of influenza virus(2014-08) Lam, Ham ChingInfluenza viruses can infect a large variety of birds and mammals including humans, pigs, domestic poultry, marine mammals, cats, dogs, horses, and wild carnivores \cite{Webster2002}. Surveillance for influenza viruses circulating in humans has been gradually increased and expanded to many areas around the world. These surveillance programs have produced large amount of influenza genomic data which facilitates the study of the virus by computational methods that are efficient and cost saving.The main focus of this dissertation research is the development of visualization methods to understand the evolution of influenza viruses circulating in humans and other mammals. The methods developed have been applied to different human influenza A subtypes, swine influenza viruses, and avian influenza viruses. The methods are based on unsupervised dimensional reduction techniques which can be applied to each individual genome segments or to the complete genome sequence of the virus. These methods are a departure from the traditional phylogenetic tree construction paradigm because very large number of high dimensional input sequences can be processed and results are viewed directly in a two or three dimensional Euclidean space.We reproduced the evolutionary trajectory of the seasonal human influenza A/H3N2 virus since its introduction to humans in 1968 on a 2D PCA space. The observed pathway led us to hypothesize that vaccination serves as a primary evolutionary pressure on this virus. We provided visual, simulation results, and statistical results to support this. The North American swine influenza H3N2 viruses were also studied using the developed visualization methods. The diversity of this virus is changing since the 2009 H1N1 pandemic outbreak. Five main clusters were observed from the visualization results. The mutations at two positive selected sites on the HA gene were identified as the potential driver for clusters segregation of this virus after the pandemic.A visualization method was developed to visually detect reassortant influenza virus. A reassortant influenza virus is difficult to detect because it consists of genome segments from different parental origin. As two different strains of influenza coinfect a single cell, the capability to exchange genome segments between these two strains can lead to progeny carrying different parental segments within its genome. In order to detect such progeny, a PCA projection based visualization method that is able to examine the full genome sequence of a reference and test strains simultaneously was developed in order to detect any reassorted segments within a full genome. Besides the development of visualization methods, we have also developed a compact Markov Chain model to estimate the probability of viruses with high genetic similarity found after a very large time gap. This model is a two components model where we combined a Markov Chain with a Poisson model. The Markov model uses Hamming distance as the evolution process of the virus and a computed mutation rate as the input to the Poisson model, combined together, we simulated the evolution process of the influenza virus under the neutral evolution process. The computational results from this model led us to conclude that the existence of reservoirs preserving viruses for decades cannot be completely eliminated.In short, our primary goal has been to develop visualization based approaches to understand the evolution of the influenza viruses from different hosts. The results we have so far suggested that the power of visualization paves the way to gain deeper understanding and insight of the evolution of the virus as we utilize the rapidly growing amount of the genomic data of the virus.