Scalable computation and analysis of elementary flux modes in metabolic networks

Jevremovic, Dimitrije2013-07-222013-07-222013-05https://hdl.handle.net/11299/153498University of Minnesota Ph.D. dissertation. Ph.D. May 2013. Major:Computer science. Advisor: Dr. Daniel Boley. 1 computer file (PDF); xii, 148 pages, appendices A-B.The full in silico reconstruction of the genome-scale cellular metabolic networks in the recent years was enabled and supported with the sequencing of genomic and transcriptomic data across numerous organism species, as well as with the growth of various biological databases. Metabolic networks provide a comprehensive overview of the cellular landscape and its phenotype prediction for the given environmental and growth conditions. Imposing the thermodynamic and steady state constraints on the reactions and internal metabolites of the given metabolic network, respectively, one still has a justifiable model which can be used to study the cellular events of interest. Metabolic pathway is introduced as a biological and mathematical concept of a subset of active reactions with a non-zero flux. In particular, elementary flux mode is a metabolic pathway with a physiological property reflected in the requirement for its enzymatic minimality. Elementary flux modes, as building blocks of the metabolic network, have numerous applications in chemical engineering and biochemistry for the study of phenotype of wild type and mutant cells. However, the large number of elementary flux modes for even medium size metabolic networks, as well as the high computational cost of the underlying Nullspace Algorithm, still present a challenge in the computation and analysis.The problem of enumerating elementary flux modes is equivalent to the one of enumerating the vertices in the degenerate polytope which still remains an open problem. Algorithm used in such enumeration is the Double Description Method, and its adaptation in the metabolic network analysis is the so-called Nullspace Algorithm. The Nullspace Algorithm is studied to identify its major bottlenecks and possible improvements in design and implementation. The reduced algebraic rank test is proposed in the event when the network has reversible reactions, and it significantly reduces the time required to check the elementarity of the candidate elementary flux mode. In the case when the network admits reversible metabolic pathway, a procedure is given to use the Nullspace Algorithm to compute one of many possible minimal generating sets, itself being a minimal subset of elementary flux modes whose linear combination can fully characterize the solution space . Initially, parallelization of the enumeration of elementary modes was proposed in the form of Combinatorial Parallel Nullspace Algorithm using MPI library routines for the message-passing distributed memory environment. Due to insufficient memory scalability and need to replicate major data structures across all the compute nodes, further steps were undertaken to attain both memory and time scalability and be able to fit larger metabolic networks to the algorithm. Global Arrays, the partitioned global address space library, was used to facilitate the development of the distributed-memory algorithm with a shared-memory view of the global memory. It allowed a merge of the concept of the shared-memory programming within a distributed-memory environment, hence allowing the computation of nearly 70 million elementary flux modes for the 83-reaction metabolic network of the Sacharomyces cerevisiae.In the event of limited memory and processor resources, a splitting of work needed to compute all elementary flux modes was needed. An earlier divide-and-conquer idea was taken and further developed into a functional implementation. It was merged with the Combinatorial Parallel Nullspace Algorithm to yield the Combined Parallel Nullspace Algorithm and demonstrated reduced memory footprint on each of the subproblems. The full set of elementary flux modes was used to enumerate multiple subsets of reactions as candidates for knockout which would collapse the metabolic network to the subset of desired elementary flux modes. Metabolic network design goals of (1) network flexibility and (2) growth-coupled production of the target chemical were incorporated within the branch-and-bound breadth-first search algorithms. Algorithms were executed on the metabolic networks of Escherichia coli and Sacharomyces cerevisae and were compared to an earlier exhaustive variation of the Berge's algorithm. Finally, a collection of ideas based on non-linear optimization methods is proposed to model the cellular phenotype with a proper objective function and infer the efficient subnetwork of the given metabolic network. Proposed methods are used with the goal of inferring reactions targeted for deletion, where the residual network will contain as many efficient elementary flux modes as possible, thus reducing the cost of running one of the previously implemented algorithms. Optimization methods aim to model the metabolic goals of the minimal energy consumption and enzymatic minimality using the L1-regularized quadratic programming framework.en-USScalable computation and analysis of elementary flux modes in metabolic networksThesis or Dissertation