Browsing by Subject "Cloud Computing"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Multi-Tenant Geo-Distributed Data Analytics(2019-07) Jonathan, AlbertGeo-distributed data analytics has gained much interest in recent years due to the need for extracting insights from geo-distributed data. Traditionally, data analytics has been done within a cluster/data center environment. However, analyzing geo-distributed data using existing cluster-based systems typically cannot satisfy the timeliness requirement of most applications and result in wasteful resource consumption due to the fundamental differences of the environments, especially due to the scarce, highly heterogeneous, and dynamic nature of the wide-area resources: compute power and network bandwidth. This thesis addresses the challenges faced by geo-distributed data analytics systems in ensuring high-performance and reliable execution of multiple data analytics applications/queries. Specifically, the focus is on sharing resources across multiple users, applications, and computing frameworks. Sharing resources is attractive as it increases resource utilization and reduces operational cost. However, ensuring high-performance execution of multiple applications in a shared environment is challenging as they may compete for the same resources, especially in a wide-area environment with scarce resources. Furthermore, dynamics such as workload variation, resource variation, stragglers, and failures are inevitable in large-scale distributed systems. These can cause large resource perturbation that significantly affect the performance of query executions. This thesis makes the following contributions. First, we present a resource sharing technique across multiple geo-distributed data analytics frameworks. The main challenge here is how to elastically partition resources while allowing high locality scheduling to each individual framework, which is critical to the execution performance of geo-distributed analytics queries. We then address the problem of how to identify and exploit common executions across multiple queries to mitigate wasteful resource consumption. We demonstrate that traditional multi-query optimization may degrade the overall query execution performance due to its lack of support for network awareness. Finally, we highlight the importance of adaptability in ensuring reliable query execution in the presence of dynamics, both for single and multiple query executions. We propose a systematic approach that can selectively determine which queries to adapt and how to adapt them based on the types of queries, dynamics, and optimization goals.Item On High Performance Cloud Based File Synchronization with User Collaboration(2016-07) Chillamcherla, MounikaOver the past few years, cloud-based file storage/synchronization systems like Dropbox, Gdrive and Skydrive, have achieved tremendous success among internet users. This new generation of service, beyond conventional client/server or peer-to-peer file hosting with storage only, provides reliable file storage and effective file synchronization for diverse user collaborations. In this thesis, we take a close look to understand such cloud-based file synchronization and collaboration systems. Using Dropbox as a case study, our real-world measurement carefully decomposes its file synchronization protocol into different stages: {\it pre-processing}, {\it uploading}, {\it downloading}, and {\it post-processing}. We show that these series of computation and communication operations, which is far more complicated than those in conventional file hosting, is necessary for Dropbox-like services especially considering the cloud deployment. Such a design can significantly improve service reliability and avoid the possible task interference on cloud-based virtual machines (VMs). Unfortunately, these operations also lead to higher latency and cost. In particular, the variance of latency across different users increases with larger population, and thus individual users may face severe performance degradation when the system scale grows. Moreover, we also notice that Dropbox assumes that their users are not online at the same time. The files are therefore uploaded to a cloud storage server and then pushed to the destination. It is easy to see that such a design is inefficient when some of the Dropbox users are online at the same time. To address this problem, we propose an enhancement to let Dropbox detect user's online status and decide whether we can directly send them the file. We tested our prototype on {\it PlanetLab} and the evaluation indicates that the design can greatly reduces the file synchronization latency with minimal system overhead.Item Power Consumption of Virtual Machines in Cloud Computing: Measurement and Enhancement(2016-07) BAI, YANVirtualization is one of the cornerstone technologies that makes utility computing platforms such as cloud computing a reality. With the accelerating adoption of cloud computing, the virtualizaion-based cloud platforms are consuming a significant amount of energy. However, the design of a green and efficient virtualization technology remains an open issue to both industry and academia. In this thesis, we for the first time investigate the virtual machine's (VM's) power consumption while supporting different services and applications (e.g., web, database and streaming). In particular, we establish a cloud computing platform in the The University of Minnesota Duluth. This platform consist of both Xen and KVM nodes and the VMs can be easily accessed from the Internet. Our real-world measurement indicates that the existing virtulization technologies add considerable energy overhead to the data centers. For example, a busy virtualized database server can consume 30\% more energy than its non-virtualized counterparts. To address such a problem, we propose a shared-memory-based enhancement to reduces the extra interrupts and memory copies for cloud virtualization. The evaluation indicates that our approach can reduce VM's energy consumption by 11% without noticeable loss of its running performance.Item Transaction and data consistency models for cloud applications(2014-02) Padhye, Vinit A.The emergence of cloud computing and large-scale Internet services has given rise to new classes of data management systems, commonly referred to as NoSQL systems. The NoSQL systems provide high scalability and availability, however they provide only limited form of transaction support and weak consistency models. There are many applications that require more useful transaction and data consistency models than those currently provided by the NoSQL systems. In this thesis, we address the problem of providing scalable transaction support and appropriate consistency models for cluster based as well as geo-replicated NoSQL systems. The models we develop in this thesis are founded upon the snapshot isolation (SI) model which has been recognized as attractive for scalability. In supporting transactions on cluster-based NoSQL systems, we introduce a notion of decoupled transaction management in which transaction management functions are decoupled from storage system and integrated with the application layer. We present two system architectures based on this concept. In the first system architecture all transaction management functions are executed in a fully decentralized manner by the application processes. The second architecture is based on a hybrid approach in which the conflict detection functions are performed by a dedicated service. Because the SI model can lead to non-serializable transaction executions, we investigate two approaches for ensuring serializability. We perform a comparative evaluation of the two architectures and approaches for guaranteeing serializability and demonstrate their scalability. For transaction management in geo-replicated systems, we propose an SI based transaction model, referred to as causal snapshot isolation (CSI), which provides causal consistency using asynchronous replication. The causal consistency model provides more useful consistency guarantees than the eventual consistency model. We build upon the CSI model to provide an efficient transaction model for partially replicated databases, addressing the unique challenges raised due to partial replication in supporting snapshot isolation and causal consistency. Through experimental evaluations, we demonstrate the scalability and performance of our mechanisms.