Jonathan, AlbertChandra, AbhishekWeissman, Jon2020-09-022020-09-022015-11-18https://hdl.handle.net/11299/215983Today, many organizations need to operate on data that is distributed around the globe. This is inevitable due to the nature of data that is generated in different locations such as video feeds from distributed cameras, log files from distributed servers, and many others. Although centralized cloud platforms have been widely used for data-intensive applications, such systems are not suitable for processing geo-distributed data due to high data transfer overheads. An alternative approach is to use an Edge Cloud which reduces the network cost of transferring data by distributing its computations globally. While the Edge Cloud is attractive for geo-distributed data-intensive applications, extending existing cluster computing frameworks to a wide-area environment must account for locality. We propose Awan: a new locality-aware resource manager for geo-distributed data-intensive applications. Awan allows resource sharing between multiple computing frameworks while enabling high locality scheduling within each framework. Our experiments with the Nebula Edge Cloud on PlanetLab show that Awan achieves up to a 28% increase in locality scheduling which reduces the average job turnaround time by approximately 20% compared to existing cluster management mechanisms.en-USAwan: Locality-aware Resource Manager for Geo-distributed Data-intensive ApplicationsReport