Browsing by Subject "Big Data"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Discovering Hidden Patterns in Anesthesia Data Associated with Unanticipated Intensive Care Unit Admissions(2017-04) Peterson, JessicaUnanticipated intensive care unit admissions (UIA) are a metric of quality anesthesia care since they have been associated with intraoperative incidents and nearly four times as likely to die within 30 days of surgery compared to patients that were not admitted to the intensive care unit unexpectedly. Patient age, American Society of Anesthesiology Classification, type of procedure, tachycardia, hypotension, and cardiovascular and neuromuscular blocking drugs administered in the operating room have all been associated with patient UIA. Intraoperative anesthesia data is generated in real-time and can be used to identify patterns in patient care associated with UIA. Knowledge about patterns in intraoperative medication administration and hemodynamic data is important to develop interventions that can be used to prevent intraoperative deterioration. Patterns were defined as two or more characteristics in the line graphs. This data visualization study discovered, labeled, and tested patterns in intraoperative hemodynamic management for association with patient UIA. Data from 68 adult, inpatient, elective surgical patients were matched to 34 patients with UIA in the University of Minnesota, Academic Health Center, Clinical Data Repository. A prototype line graph was evaluated to identify salient (obvious) patterns in intraoperative hemodynamic management for the data set. Line graphs for patients with and without UIA were created and visualized. Patterns in intraoperative hemodynamic management were discovered using data visualization with line graphs and operationally defined. Odds ratios were used to test categorical patterns and one-way analysis of variance was used to test continuous numeric patterns for association with patient UIA. Seven patterns were significantly associated with patient UIA (p < .05).Item Efficient Data Management and Processing in Big Data Applications(2017-05) Cao, XiangIn today's Big Data applications, huge amount of data are being generated. With the rapid growth of data amount, data management and processing become essential. It is important to design efficient approaches to manage and process data. In this thesis, data management and processing are investigated for Big Data applications. Key-value store (KVS) is widely used in many Big Data applications by providing flexible and efficient performance. Recently, a new Ethernet accessed disk drive for key-value pairs called "Kinetic Drive" was developed by Seagate. It can reduce the management complexity, especially in large-scale deployment. It is important to manage the key-value pairs and store them in Kinetic Drives in an organized way. In this thesis, we present data allocation schemes on a large-scale key-value store system using Kinetic Drives. We investigate key indexing schemes and allocate data on drives accordingly. We propose efficient approaches to migrate data among drives. Also, it is necessary to manage huge amount of key-value pairs to provide attributes search for users. In this thesis, we design a large-scale searchable key-value store system based on Kinetic Drives. We investigate an indexing scheme to map data to the drives. We propose a key generation approach to reflect metadata information of the actual data and support users' attributes search requests. Nowadays, MapReduce has become a very popular framework to process data in many applications. Data shuffling usually accounts for a large portion of the entire running time of MapReduce jobs. In recent years, scale-up computing architecture for MapReduce jobs has been developed. With multi-processor, multi-core design connected via NUMAlink and large shared memories, NUMA architecture provides a powerful scale-up computing capability. In this thesis, we focus on the optimization of data shuffling phase in MapReduce framework in NUMA machine. We concentrate on the various bandwidth capacities of NUMAlink(s) among different memory locations to fully utilize the network. We investigate the NUMAlink topology and propose a topology-aware reducer placement algorithm to speed up the data shuffling phase. We extend our approach to a larger computing environment with multiple NUMA machines.Item Interview with Constantin Aliferis(2015-06-08) Aliferis, Constantin F.; Tobbell, DominiqueConstantin Aliferis begins by discussing his educational background, including his early interest in biomedical and health informatics. He describes the main focus of his research since graduate school, which has included machine learning and the analysis of complex and high-dimensional data sets; scientometrics and informatics retrieval; and model building, analysis, and knowledge discovery across a variety of disease domains. Aliferis goes on to briefly discuss his tenure at Vanderbilt University, followed by a more detailed discussion of his tenure at New York University. Next, Aliferis offers his definition of precision medicine. The remainder of the interview focuses on health informatics at the University of Minnesota. Aliferis describes his vision for the Institute for Health Informatics, reflects on the strong backing provided by the leadership of the University and the University’s Academic Health Center to support this vision, and offers his perspective on the future of the field of biomedical and health informatics.Item Scalable and Ensemble Learning for Big Data(2019-05) Traganitis, PanagiotisThe turn of the decade has trademarked society and computing research with a ``data deluge.'' As the number of smart, highly accurate and Internet-capable devices increases, so does the amount of data that is generated and collected. While this sheer amount of data has the potential to enable high quality inference, and mining of information, it introduces numerous challenges in the processing and pattern analysis, since available statistical inference and machine learning approaches do not necessarily scale well with the number of data and their dimensionality. In addition to the challenges related to scalability, data gathered are often noisy, dynamic, contaminated by outliers or corrupted to specifically inhibit the inference task. Moreover, many machine learning approaches have been shown to be susceptible to adversarial attacks. At the same time, the cost of cloud and distributed computing is rapidly declining. Therefore, there is a pressing need for statistical inference and machine learning tools that are robust to attacks and scale with the volume and dimensionality of the data, by harnessing efficiently the available computational resources. This thesis is centered on analytical and algorithmic foundations that aim to enable statistical inference and data analytics from large volumes of high-dimensional data. The vision is to establish a comprehensive framework based on state-of-the-art machine learning, optimization and statistical inference tools to enable truly large-scale inference, which can tap on the available (possibly distributed) computational resources, and be resilient to adversarial attacks. The ultimate goal is to both analytically and numerically demonstrate how valuable insights from signal processing can lead to markedly improved and accelerated learning tools. To this end, the present thesis investigates two main research thrusts: i) Large-scale subspace clustering; and ii) unsupervised ensemble learning. The aforementioned research thrusts introduce novel algorithms that aim to tackle the issues of large-scale learning. The potential of the proposed algorithms is showcased by rigorous theoretical results and extensive numerical tests.Item SpatialHadoop: A MapReduce Framework for Big Spatial Data(2016-06) Eldawy, AhmedThere has been a recent explosion in the amounts of spatial data produced by several devices such as smart phones, satellites, space telescopes, medical devices, among others. This variety of such spatial data makes it widely used across important applications such as brain simulations, identifying cancer clusters, tracking infectious disease, drug addiction, simulating climate changes, and event detection and analysis. While there are several distributed systems that are designed to handle Big Data in general, e.g., Hadoop, Hive, Spark, and Impala, they all fall short in supporting spatial data efficiently. As a result, there are great research efforts in either extending these systems or building new systems to efficiently support Big Spatial Data. In this thesis, we describe SpatialHadoop, a full-fledged system for spatial data which extends Hadoop in its core to efficiently support spatial data. SpatialHadoop is available as an open source software and has been already downloaded around 80,000 times. SpatialHadoop consists of four main layers, namely, language, indexing, query processing, and visualization. In the language layer, SpatialHadoop provides a high level language, termed Pigeon, which provides standard spatial data types and query processing for easy access to non-technical users. The indexing layer provides efficient spatial indexes, such as grid, R-tree, R+-tree, and Quad tree, which organize the data nicely in the distributed file system. The indexes follow a two-level design of one global index that partitions the data across machines, and multiple local indexes that organize records in each machine. The query processing layer encapsulates a set of spatial operations that ship with SpatialHadoop including basic spatial operations, join operations and computational geometry operations. The visualization layer allows users to explore big spatial data by generating images that provide bird’s-eye view on the data. SpatialHadoop is already used as a back bone in several real systems, including SHAHED, a web-based application for interactive exploration of satellite data.