Wei, Weiqi2012-02-222012-02-222012-02http://purl.umn.edu/120984University of Minnesota Ph.D. dissertation. January 2012. Major: Health Informatics. Advisor: Christopher G. Chute. 1 computer file (PDF); v, 95 pages, appendix p. 95.Subject selection is essential and has become the rate-limiting step for harvesting knowledge to advance healthcare through clinical research. Present manual approaches inhibit researchers from conducting deep and broad studies and drawing confident conclusions. High-throughput clinical phenotyping (HTCP), a recently proposed approach, leverages the machine-processable content from electronic medical record (EMR) for this otherwise inefficient process making subject selections scalable and practical. However, the ability to capture a patient’s medical data is often limited because of commonly existing data fragmentation problems within current EMR systems, i.e. different data types (structured vs. unstructured), heterogeneous data sources (single medical center vs. multiple healthcare centers), and various time frames (short time frame vs. long time frame). The role of data fragmentation on HTCP remains unknown. In this dissertation, by taking advantage of the REP patient-record-linkage system and the richness of EMR data at Mayo Clinic, I provide a multidimensional and thorough demonstration of how data fragmentation affects HTCP. The predominant message that this dissertation delivered to the health informatics field can be summarized as data fragmentation of EMR has a remarkable influence on HTCP. This risk should be carefully considered and mitigated by clinical researchers for the secondary and meaningful use of EMR, especially when developing or executing an HTCP algorithm for subject selection.en-USData fragmentationData miningElectronic medical recordHigh-throughput clinical phenotypingNatural language processingTerminologyHealth InformaticsThe impact of data fragmentation on high-throughput clinical phenotyping.Thesis or Dissertation