Zare Sangederazi, Hossein2009-06-172009-06-172009-04https://hdl.handle.net/11299/51274University of Minnesota Ph.D. dissertation. April 2009. Major: Electrical Engineering. Advisors: Prof. Mostafa Kaveh and Prof. Arkady Khodursky. 1 computer file (PDF); xiv, 116 pages. Ill. (some col.)As genomic technology and sequencing projects continue to advance, more emphasis needs to be put on data analysis, while addressing the issue of how best to extract information from diverse data sets. For example, functional annotation of new genes can no longer depends only on sequence analysis, but requires integration of additional sources of information including phylogeny, gene expression, protein interaction, metabolic and regulatory networks. Therefore, new biological discoveries will depend strongly on our ability to combine these diverse data sets. We demonstrate how information from gene expression, regulatory sequence patterns and location data can be combined to discover regulatory modules and to construct gene transcriptional regulatory networks. In the context of modeling regulatory sequences, we propose a higher order probabilistic model to efficiently discriminate between the binding sites of a transcription factor and non-specific DNA sequences. Moreover, a model-based algorithm is developed, which integrates gene expression data, modeled by mixtures of Gaussian, with the regulatory sequence patterns for clustering of functionally related genes. For the construction of the gene regulatory network, we introduce the concept of Gene-Regulon association in contrast to Gene-Gene interaction. Unlike Gene-Gene interaction methods, where the mRNA levels of the regulators play the important role, Gene-Regulon methods rely on the activity profiles of the transcription factors. These activity profiles, in the absence of their direct measurements, are estimated concurrently via a computational model. We develop a model selection algorithm, which is capable of capturing the activity profile of a transcription factor from the transcriptional activity of its target genes. In addition, we present a data driven approach based on nonlinear kernel embedding for capturing the nonlinear correlation and geometric connectivity pattern in gene expression data. We apply these methods for integrating gene expression and interaction data to construct a network of transcriptional regulation in Escherichia coli (E. coli).en-USBioinformaticsClusteringComputational BiologyGene Regulatory NetworkGenomic Signal ProcessingElectrical EngineeringIntegrated analysis of genomic data for inferring gene regulatory networks.Thesis or Dissertation