Subject selection is essential and has become the rate-limiting step for
harvesting knowledge to advance healthcare through clinical research. Present manual
approaches inhibit researchers from conducting deep and broad studies and drawing
confident conclusions. High-throughput clinical phenotyping (HTCP), a recently
proposed approach, leverages the machine-processable content from electronic medical
record (EMR) for this otherwise inefficient process making subject selections scalable
However, the ability to capture a patient’s medical data is often limited because
of commonly existing data fragmentation problems within current EMR systems, i.e.
different data types (structured vs. unstructured), heterogeneous data sources (single
medical center vs. multiple healthcare centers), and various time frames (short time
frame vs. long time frame). The role of data fragmentation on HTCP remains unknown.
In this dissertation, by taking advantage of the REP patient-record-linkage
system and the richness of EMR data at Mayo Clinic, I provide a multidimensional and
thorough demonstration of how data fragmentation affects HTCP. The predominant
message that this dissertation delivered to the health informatics field can be
summarized as data fragmentation of EMR has a remarkable influence on HTCP. This risk should be carefully considered and mitigated by clinical researchers for the
secondary and meaningful use of EMR, especially when developing or executing an
HTCP algorithm for subject selection.