Alsagabi, Majid Ibrahim2014-11-122014-11-122012-09https://hdl.handle.net/11299/167707University of Minnesota Ph.D. dissertation. September 2012. Major: Electrical Engineering. Advisor: Professor Ahmed H. Tewfik. 1 computer file (PDF); vii, 123 pages.The human DNA copy number variation (DCV) has been proven to be correlated to abnormal traits and features in human beings. The genomic hybridization experiment is a powerful biological tool to measure the level of the DNA copy number in thousands or millions of genomic sites simultaneously. The experiment is subject to large amounts of noise and a high level of uncertainty about the biological meaning of its measurements.The existing methods to detect the DCV are based on the two-channel approach which consists of test and reference samples. Most of the methods are ill conditioned for large data sets because of their complexity and sophisticated approaches. Furthermore, they fall short of achieving an acceptable sensitivity or they generate large amounts of false calls. The first part of this thesis explores the existing methods and presents four new models to simplify the solution. The four models are based on Band-Pass Wavelet Transform, Uncovered Markov Model, the Uniformly Most Powerful Test, and the Maximum Likelihood Estimator. The four models achieve the highest sensitivity, lowest false alarm rate, and the least complexity of all models.The second part of the thesis presents a novel model for DCV detection using a single-channel approach. The model is based on the concept of sensor networks which can be used to analyze the DNA samples from one or two channels. The model comprises three normalization techniques to remove the non-biological bias from the measurements. Then, it estimates the true distribution of the normal measurements by isolating their distribution from the heterogeneous mixture. The complexity of calculating the probability of the average error is overcome by using the saddle-point approximation and the log-lattice design. The accuracy of the saddle-point approximation is proven for both the two-channel and the single-channel approaches in homogenous and non-homogenous environments. The analysis includes both simulated and real-world datasets and it explores the recurrent DCV in large populations using the International Hapmap Project Datasets. The end of the second part of the thesis demonstrates the stationarity of the hybridization experiment and shows its impact on reducing the complexity of the analysis.The third part of the thesis investigates patterns of the DNA copy number variations. The human genetic network is a quite complex system where hundreds, or even thousands, of DNA segments interact internally with each other directly or indirectly to control all the body's functions. A bottom-up subspace-clustering algorithm is presented to reveal the biological signature of two studied phenotypes: Autism, and the lethal castration-resistant prostate cancer.enElectrical engineeringThe effect of copy number variation on human phenotypesThesis or Dissertation