Mining high-dimensional bioprocess and gene expression data for enhanced process performance

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


Mining high-dimensional bioprocess and gene expression data for enhanced process performance

Published Date




Thesis or Dissertation


Over the past few decades, recombinant protein therapeutics produced in cultured mammalian cells have fundamentally transformed modern medicine and improved millions of patients' lives. The drastic increase in product concentration and the number of products approved by the US Food and Drug Administration (FDA) have been attributed largely to the relentless efforts of the entire pharmaceutical community on multiple technological fronts. The remarkable advances of high-throughput genomic and process analytical tools in recent years have allowed us to extensively characterize almost all steps along a typical cell culture process. The massive amount of data generated by these technologies harbors vital information about the process, yet presents substantial challenges due to its exceptionally high dimensionality. This thesis research has applied advanced multivariate approaches to explore these sets of data and comprehend profound cellular changes during various development and manufacturing stages.Through mining a large set of manufacturing data, we uncovered a "memory" effect, suggesting that the final outcome of a production culture is primarily affected by the early seed culture. Several parameters related to lactate metabolism and cell growth were identified as having a pivotal influence on process performance. Furthermore, transcriptome analysis of cells undergoing selection and amplification was performed using multiple statistical, clustering, and functional analysis methods. Profound transcriptional changes were discerned, upon which a combined hyper-productivity gene set involving cell cycle control, signaling, and protein processing and secretion was derived. These differentially expressed genes present promising targets for cellular modulation to enhance process performance. We further developed a novel genetic tool to engineer the expression dynamics of these genes. A large number of genes with time dynamic expression trends were identified through mining time-series transcriptome data. The promoters of these genes offer effective means to drive the expression profiles of the targets in a dynamic manner. The systems approaches outlined in this research thus hold promise to deepen our understanding of process characteristics and open new avenues for process improvement.


University of Minnesota Ph.D. dissertation. July 2012. Major: Chemical Engineering. Advisor: Professor Wei-Shou Hu. 1 computer file (PDF); xi, 178 pages, appendix p. 160-178.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Le, Huong Thi Ngoc. (2012). Mining high-dimensional bioprocess and gene expression data for enhanced process performance. Retrieved from the University Digital Conservancy,

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.