Kim, JinohChandra, AbhishekWeissman, Jon2020-09-022020-09-022008-11-24https://hdl.handle.net/11299/215784Distributed computing applications are increasingly utilizing distributed data sources. However, the unpredictable cost of data access in large-scale computing infrastructures can lead to severe performance bottlenecks. Providing predictability in data access is thus essential to accommodate the large set of newly emerging large-scale, data-intensive computing applications. In this regard, accurate estimation of network performance is crucial to meeting the performance goals of such applications. Passive estimation based on past measurements is attractive for its relatively small overhead compared to relying on explicit probing. In this paper, we take a passive approach for network performance estimation. Our approach is different from existing passive techniques that rely either on past direct measurements of pairs of nodes or on topological similarities. Instead, we exploit secondhand measurements collected by other nodes without any topological restrictions. OPEN (Overlay Passive Estimation of Network performance) is a scalable framework providing end-to-end network performance estimation based on secondhand measurements. Using actual downloading traces collected for 10 months in PlanetLab, we show that OPEN provides low-overhead, accurate estimation for replica and resource selection problems common to distributed computing. Results from our simulation study show that OPEN significantly outperforms selection techniques based on statistical pairwise estimations as well as random and latency-based selections in diverse experimental settings.en-USOPEN: Passive Network Performance Estimation for Data-intensive ApplicationsReport