Browsing by Subject "Big Data Analytics"

Now showing 1 - 1 of 1

Exploiting Heterogeneous Resources in a Multi-cloud Environment
(2019-07) Oh, Kwangsung
Today, we increasingly rely upon Internet services and applications to automate many daily activities. As of 2019, for example, 118 million people enjoy watching Netflix in their free time, 150 million people find places to stay while they are traveling through AirBnB, and 75 million people rely on Uber to find a car to move around. For data locality, low latency, and availability, many applications utilize diverse cloud resources e.g., storage, network, and compute, in multiple geo-distributed data centers (DCs) of public cloud providers such as Amazon, Microsoft, and Google. In addition, applications can exploit greater cloud resource options if they consider using these cloud providers’ DCs together. Most cloud providers offer heterogeneous cloud resources e.g., memory, SSD, disk, and archival storage for cloud storage, that allow applications to choose a resource based on requirements and demands. Exploiting such heterogeneous resources, however, brings significant complexities to applications because each cloud resource option has different interfaces, data models, pricing policies, and geographical locations. The heterogeneities of cloud resources allow applications to trade off among different metrics, e.g., latency, availability, monetary cost and so on. To maximize the benefits of heterogeneous cloud resources, applications must answer the question: "what is the best cloud resource configuration (which data centers and which cloud resources) to use to achieve our goals with minimized monetary cost?". Answering this question, however, is challenging because answers are different for each application based on their goals, e.g., SLA (performance), cost budget, consistency model, degree of fault tolerance and so on. Adding to the challenges, dynamics from a multi-cloud environment e.g., network outages and bandwidth/latency fluctuation, and from applications e.g., users’ locations, demand, and data access patterns, make it near impossible to determine the best cloud resource configuration statically. This thesis presents answers to these questions—how to exploit heterogeneous cloud resources easily, how to determine optimal cloud resource configurations, and how to handle dynamics—thereby addressing the challenges in a multi-cloud environment by building three novel and usable systems: a policy-driven geo-distributed cloud storage system called Wiera, an automated multi-tiered geo-distributed data placement system called TripS, and a network cost-aware geo-distributed data analytics system called Kimchi.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Big Data Analytics"