Browsing by Author "Weissman, Jon"
Now showing 1 - 20 of 23
- Results Per Page
- Sort Options
Item A GA-based Approach for scheduling Decomposable Data Grid Applications(2004-02-18) Kim, Seonho; Weissman, JonData Grid technology promises geographically distributed scientists to access and share physically distributed resources such as compute resource, networks, storage, and most importantly data collections for large-scale data intensive problems. The massive size and distributed nature of these datasets poses challenges to data intensive applications. Scheduling Data Grid applications must consider communication and computation simultaneously to achieve high performance. In many Data Grid applications, Data can be decomposed into multiple independent sub datasets and distributed for parallel execution and analysis. In this paper, we exploit this property and propose a novel Genetic Algorithm based approach that automatically decomposes data onto communication and computation resources. The proposed GA-based scheduler takes advantage of the parallelism of decomposable Data Grid applications to achieve the desired performance level. We evaluate the proposed approach comparing with other algorithms. Simulation results show that the proposed GA-based approach can be a competitive choice for scheduling large Data Grid applications in terms of both scheduling overhead and the quality of solutions as compared to other algorithms.Item A Security-Enabled Grid System for Distributed Data Mining(2007-01-10) Kim, Seonho; Kim, Jinoh; Weissman, JonIn this paper, we present a Grid system that enables distributed data mining, exploration and sharing to address issues described above which involve distributed data analysis and multiple ownerships. The system addresses the three main requirements of distributed data mining on Grid: 1) exploitation of geographically and organizationally distributed computing resources to solve data-intensive data mining problem, 2) ensuring the security and privacy of sensitive data, 3) supporting seamless data/computing resource sharing. We present the system architecture, specification of the component services, security consideration, a prototype of the system, and performance evaluation.Item A Stochastic Control Model for Leasing Computational Resources(2004-02-20) England, Darin; Weissman, JonWe present a model for determining the optimal resource leasing policy for a dynamic grid service. The model assumes that the demand for the service as well as the actual execution times are unknown, but can be estimated. We cast the problem in a Dynamic Programming framework and we are able to show that the model can make good resource leasing decisions in the face of such uncertainties. In particular, we use the model to decide how many resources should be leased for the service and for how long. The results show that use of the model reduces the cost of leasing computational resources and significantly reduces the variance of the cost.Item Accessibility-based Resource Selection in Loosely-coupled Distributed Systems(2007-11-20) Kim, Jinoh; Chandra, Abhishek; Weissman, JonLarge-scale distributed systems provide an attractive scalable infrastructure for network applications. However, the loosely-coupled nature of this environment can make data access unpredictable, and in the limit, unavailable. Availability is normally characterized as a binary property, yes or no, often with an associated probability. However, availability conveys little in terms of expected data access performance. Using availability alone, jobs may suffer intolerable response time, or even fail to complete, due to poor data access. We introduce the notion of accessibility, a more general concept, to capture both availability and performance. An increasing number of data-intensive applications require not only considerations of node computation power but also accessibility for adequate job allocations. For instance, selecting a node with intolerably slow connections can offset any benefit to running on a fast node. In this paper, we present accessibility-aware resource selection techniques by which it is possible to choose nodes that will have efficient data access to remote data sources. We have that the local data access observations collected from a node's neighbors are sufficient to characterize accessibility for that node. We then present resource selection heuristics guided by this principle, and show that they significantly outperform standard techniques. We also investigate the impact of churn in which nodes change their status of participation such that they lose their memory of prior observations. Despite this level of unreliability, we show that the suggested techniques yield good results.Item Awan: Locality-aware Resource Manager for Geo-distributed Data-intensive Applications(2016-03-10) Jonathan, Albert; Chandra, Abhishek; Weissman, JonToday, many organizations need to operate on data that is distributed around the globe. This is inevitable due to the nature of data that is generated in different locations such as video feeds from distributed cameras, log files from distributed servers, and many others. Although centralized cloud platforms have been widely used for data-intensive applications, such systems are not suitable for processing geo-distributed data due to high data transfer overheads. An alternative approach is to use an Edge Cloud which reduces the network cost of transferring data by distributing its computations globally. While the Edge Cloud is attractive for geo-distributed data-intensive applications, extending existing cluster computing frameworks to a wide-area environment must account for locality. We propose Awan : a new locality-aware resource manager for geo-distributed data- intensive applications. Awan allows resource sharing between multiple computing frameworks while enabling high locality scheduling within each framework. Our experiments with the Nebula Edge Cloud on PlanetLab show that Awan achieves up to a 28% increase in locality scheduling which reduces the average job turnaround time by approximately 18% compared to existing cluster management mechanisms.Item Awan: Locality-aware Resource Manager for Geo-distributed Data-intensive Applications(2015-11-18) Jonathan, Albert; Chandra, Abhishek; Weissman, JonToday, many organizations need to operate on data that is distributed around the globe. This is inevitable due to the nature of data that is generated in different locations such as video feeds from distributed cameras, log files from distributed servers, and many others. Although centralized cloud platforms have been widely used for data-intensive applications, such systems are not suitable for processing geo-distributed data due to high data transfer overheads. An alternative approach is to use an Edge Cloud which reduces the network cost of transferring data by distributing its computations globally. While the Edge Cloud is attractive for geo-distributed data-intensive applications, extending existing cluster computing frameworks to a wide-area environment must account for locality. We propose Awan: a new locality-aware resource manager for geo-distributed data-intensive applications. Awan allows resource sharing between multiple computing frameworks while enabling high locality scheduling within each framework. Our experiments with the Nebula Edge Cloud on PlanetLab show that Awan achieves up to a 28% increase in locality scheduling which reduces the average job turnaround time by approximately 20% compared to existing cluster management mechanisms.Item Constellation Plan B Report(2020-01-03) Ramkumar, Aravind Alagiri; Sindhu, Rohit; Weissman, JonData explosion has been exponential in the last decade. A new and major addition to this phenomenon is the Internet of Things (IoT). With the technology becoming cheap and affordable, there has been a boom in the number of smart devices which generates huge amounts of data. All of these devices may have varying connectivity and are made of diparte designs. IoT is one platform which can help us make use of this huge amount of data being generated by thesis heterogeneous devices. Since, these devices are very different in design and architecture, we propose a system which provides a unified view to this heterogeneous world. We propose a dynamic P2P and edge based system named Constellation. This work presents the work pertaining to creating this network, the inherent dynamism involved in such a system and ways to utilize the system to provide useful analysis by executing global and local tasks.Item Costs and Benefits of Load Sharing in the Computational Grid(2003-08-26) England, Darin; Weissman, JonWe present an analysis of the costs and benefits of load sharing of parallel jobs in the computational grid. We begin with a workload generation model that captures the essential properties of parallel jobs and use it as input to a grid simulation model. Our experiments are performed for both homogeneous and heterogeneous grids. We measured average job slowdown with respect to both local and remote jobs and we show that, with some reasonable assumptions concerning the migration policy, load sharing proves to be beneficial when the grid is homogeneous, and that load sharing can adversely affect job slowdown for lightly-loaded machines in a heterogeneous grid. With respect to the number of sites in a grid, we find that the benefits obtained by load sharing do not scale well. Small to modest-size grids can employ load sharing as effectively as large-scale grids. We also present and evaluate an effectivescheduling heuristic for migrating a job within the grid.Item Dynamic Outsourcing Mobile Computation to the Cloud(2011-03-14) Mei, Chonglei; Shimek, James; Wang, Chenyu; Chandra, Abhishek; Weissman, JonMobile devices are becoming the universal interface to online services and cloud computing applications. Since mobile phones have limited computing power and battery life, there is a potential to migrate computation intensive application components to external computing resources. The Cloud is an attractive platform for offloading due to elastic resource provisioning and the ability to support large scale service deployment. In this paper,we discuss the potential for offloading mobile computation for different computation patterns and analyze their tradeoffs. Experiments show that the we can achieve a 27 fold speedup for an image manipulation application and 1.47 fold speedup for a face recognition application. In addition, this outsourcing model can also result in power saving depending on the computation pattern, offloading configuration, and execution environment.Item Early Experience with Mobile Computation Outsourcing(2010-08-25) Ramakrishnan, Siddharth; Reutiman, Robert; Iverson, Harlan; Lian, Shimin; Chandra, Abhishek; Weissman, JonEnd-user mobile devices such as smart phones, PDAs, and tablets, offer the promise of anywhere, anytime computing and communication. Users increasingly expect their mobile device to support the same activities (and same performance) as their desktop counterparts: seamless multitasking, ubiquitous data access, social networking, game playing, and real work, while on-the-go. To support mobility, these devices are typically constrained in terms of their compute power, energy, and network bandwidth. Supporting the full-array of desired desktop applications and behavior yet retaining the flexibility of the mobile cannot always be met using local resources alone. To achieve this vision, additional resources must be easily harnessed on-demand. In this paper, we describe opportunities for mobile outsourcing using external proxy resources. We present several application scenarios and preliminary results.Item Elastic Job Bundling: An Adaptive Resource Request Strategy for Large-Scale Parallel Applications(2015-04-16) Liu, Feng; Weissman, JonIn today’s batch queue HPC cluster systems, the user submits a job requesting a fixed number of processors. The system will not start the job until all of the requested resources become available simultaneously. When cluster workload is high, large sized jobs will experience long waiting time due to this policy. In this paper, we propose a new approach that dynamically decomposes a large job into smaller ones to reduce waiting time, and lets the application expand across multiple subjobs while continuously achieving progress. This approach has three bene?ts: (i) application turnaround time is reduced, (ii) system fragmentation is diminished, and (iii) fairness is promoted. Our approach does not depend on job queue time prediction but exploits available back?ll opportunities. Simulation results have shown that our approach can reduce application mean turnaround time by up to 48%.Item Exploiting Heterogeneity for Collective Data Downloading in Volunteer-based Networks(2006-11-29) Kim, Jinoh; Chandra, Abhishek; Weissman, JonScientific computing is being increasingly deployed over volunteer-based distributed computing environments consisting of idle resources on donated user machines. A fundamental challenge in these environments is the dissemination of data to the computation nodes, with the successful completion of jobs being driven by the efficiency of collective data download across compute nodes, and not only the individual download times. This paper considers the use of a data network consisting of data distributed across a set of data servers, and focuses on the server selection problem: how do individual nodes select a server for downloading data to minimize the communication makespan - the maximal download time for a data file. Through experiments conducted on a Pastry network running on PlanetLab, we demonstrate that nodes in a volunteer-based network are heterogeneous in terms of several metrics, such as bandwidth, load, and capacity, which impact their download behavior. We propose new server selection heuristics that incorporate these metrics, and demonstrate that these heuristics outperform traditional proximity-based server selection, reducing average makespans by at least 30%. We further show that incorporating information about download concurrency avoids overloading servers, and improves performance by about 17-43% over heuristics considering only proximity and bandwidth.Item Exploring the Throughput-Fairness Tradeoff of Deadline Scheduling in Heterogeneous Computing Environments(2008-01-31) Sundaram, Vasumathi; Chandra, Abhishek; Weissman, JonThe scalability and computing power of large-scale computational platforms that harness processing cycles from distributed nodes has made them attractive for hosting compute-intensive time-critical applications. Many of these applications are composed of computational tasks that require specific deadlines to be met for successful completion. In scheduling such tasks, replication becomes necessary due to the heterogeneity and dynamism inherent in these computational platforms. In this paper, we show that combining redundant scheduling with deadline-based scheduling in these systems leads to a fundamental tradeoff between throughput and fairness. We propose a new scheduling algorithm called Limited Resource Earliest Deadline (LRED) that couples redundant scheduling with deadline-driven scheduling in a flexible way by using a simple tunable parameter to exploit this tradeoff. Our evaluation of LRED using trace-driven and synthetic simulations shows that LRED provides a powerful mechanism to achieve desired throughput or fairness under high loads and low timeliness environments, where these tradeoffs are most critical.Item Heterogeneity-Aware Workload Distribution in Donation Based Grids(2005-12-28) Trivedi, Rahul; Chandra, Abhishek; Weissman, JonThis paper aims to explore the opportunities in porting a highthroughput Grid computing middleware to a high-performance service oriented environment. It exposes the limitations of the Grid computing middleware when operating in such a performance sensitive environment and presents ways of overcoming these limitations. We focus on exploiting the heterogeneity of the Grid resources to meet the performance requirements of services and present several approaches of work distribution to deal with this heterogeneity. We present a heuristic for finding the optimum decomposition of work and present algorithms for each of the approaches which we evaluate on a real live testbed. The results validate the heuristic and compare the performance of the different workload distribution strategies. Our results indicate that a significant improvement in performance can be achieved by making the Grid computing middleware aware of the heterogeneity in the underlying infrastructure. The results also provide some useful insights into deciding a work distribution policy depending on the status of the Grid computing environment.Item Hosting Services on the Grid: Challenges and Opportunities(2005-07-06) Chandra, Abhishek; Trivedi, Rahul; Weissman, JonIn this paper, we present the challenges to service hosting on the Grid using a measurement study on a prototype Grid testbed. For this experimental study, we have deployed the bioinformatics service BLAST from NCBI on PlanetLab, a set of widely distributed nodes, managed by the BOINC middleware. Our results indicate that the stateless nature of BOINC presents three major challenges to service hosting on the Grid: the need to deal with substantial computation and communication heterogeneity, the need to handle tight data and computation coupling, and the necessity for handling distinct response-sensitive service requests. We present experimental results that illuminate these challenges and discuss possible remedies to make Grids viable for hosting emerging services.Item Mobilizing the Cloud: Enabling Multi-User Mobile Outsourcing in the Cloud(2011-11-21) Mei, Chonglei; Taylor, Daniel; Wang, Chenyu; Chandra, Abhishek; Weissman, JonMobile devices, such as smartphones and tablets, are becoming the universal interface to online services and applications. However, such devices have limited computational power and battery life, which limits their ability to execute rich, resource-intensive applications. Mobile computation outsourcing to external resources has been proposed as a technique to alleviate this problem. Most existing work on mobile outsourcing has focused either on single application optimization, or outsourcing to fixed, local resources, with the assumption that wide-area latency is prohibitively high. In this paper, we present the design and implementation of an Android/Amazon EC2-based mobile application outsourcing platform, leveraging the cloud for scalability, elasticity, and multi-user code/data sharing. Using this platform, we empirically demonstrate that the cloud is not only feasible but desirable as an offloading platform for latency tolerant applications despite wide-area latencies. Our platform is designed to dynamically scale to support a large number of mobile users concurrently by utilizing the elastic provisioning capabilities of the cloud, as well as by allowing reuse of common code components across multiple users. Additionally, we have developed techniques for detecting data sharing across multiple applications, and proposed novel scheduling algorithms that exploit such data sharing for better scalability and user performance. Experiments with our offloading platform show that our proposed techniques and algorithms substantially improve application performance, while achieving high efficiency in terms of resource and network usage.Item MPI-based Adaptive Parallel Grid Services(2003-08-26) RaoAbburi, Lakshman; Weissman, JonThis report presents the design and implementation of an adaptive MPI implmentation (adaptive-MPI) that allows an MPI application to adapt to respond to changing CPU availability. An adaptive MPI application can start sooner with fewer processors, opportunistically add processors later should they become available, and release processors to avoid suspension should the resource owner take them back. The behavior of adaptive-MPI is well-suited to the unpredictable and dynamic nature of the Grid. We presents results that indicate the systems overhead of adaptive MPI is small, and that performance benefits in terms of reduced waiting time and reduced completion time can be achieved relative to traditional MPI.Item Nebula: Data Intensive Computing over Widely Distributed Voluntary Resources(2013-03-14) Ryden, Mathew; Chandra, Abhishek; Weissman, JonCentralized cloud infrastructures have become the de-facto platform for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for dispersed-data-intensive applications, where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed cloud infrastructure that uses voluntary edge resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimizations including location-aware data and computation placement, replication, and recovery. We evaluate Nebula's performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated BOINC.Item OPEN: Passive Network Performance Estimation for Data-intensive Applications(2008-11-24) Kim, Jinoh; Chandra, Abhishek; Weissman, JonDistributed computing applications are increasingly utilizing distributed data sources. However, the unpredictable cost of data access in large-scale computing infrastructures can lead to severe performance bottlenecks. Providing predictability in data access is thus essential to accommodate the large set of newly emerging large-scale, data-intensive computing applications. In this regard, accurate estimation of network performance is crucial to meeting the performance goals of such applications. Passive estimation based on past measurements is attractive for its relatively small overhead compared to relying on explicit probing. In this paper, we take a passive approach for network performance estimation. Our approach is different from existing passive techniques that rely either on past direct measurements of pairs of nodes or on topological similarities. Instead, we exploit secondhand measurements collected by other nodes without any topological restrictions. OPEN (Overlay Passive Estimation of Network performance) is a scalable framework providing end-to-end network performance estimation based on secondhand measurements. Using actual downloading traces collected for 10 months in PlanetLab, we show that OPEN provides low-overhead, accurate estimation for replica and resource selection problems common to distributed computing. Results from our simulation study show that OPEN significantly outperforms selection techniques based on statistical pairwise estimations as well as random and latency-based selections in diverse experimental settings.Item Reputation-Based Scheduling on Unreliable Distributed Infrastructures(2005-11-21) Sonnek, Jason; Nathan, Mukesh; Chandra, Abhishek; Weissman, JonThis paper presents a design and analysis of scheduling techniques to cope with the inherent unreliability and instability of worker nodes in large-scale donation-based distributed infrastructures such as P2P and Grid systems. In particular, we focus on nodes that execute tasks via donated computational resources and may behave erratically or maliciously. We present a model in which reliability is not a binary property but a statistical one based on a node's prior performance and behavior. We use this model to construct several reputation-based scheduling algorithms that employ estimated reliability ratings of worker nodes for efficient task allocation. Our scheduling algorithms are designed to adapt to changing system conditions as well as non-stationary behavior of node reliability. Through simulation of a BOINC-like distributed computing infrastructure, we demonstrate that our algorithms can significantly improve throughput, while maintaining a very high success rate of task completion. Our results also indicate that reputation-based scheduling can handle wide variety of worker populations, including non-stationary behavior, with overhead that scales well with system size.