Kim, JinohChandra, AbhishekWeissman, Jon2020-09-022020-09-022007-11-20https://hdl.handle.net/11299/215742Large-scale distributed systems provide an attractive scalable infrastructure for network applications. However, the loosely-coupled nature of this environment can make data access unpredictable, and in the limit, unavailable. Availability is normally characterized as a binary property, yes or no, often with an associated probability. However, availability conveys little in terms of expected data access performance. Using availability alone, jobs may suffer intolerable response time, or even fail to complete, due to poor data access. We introduce the notion of accessibility, a more general concept, to capture both availability and performance. An increasing number of data-intensive applications require not only considerations of node computation power but also accessibility for adequate job allocations. For instance, selecting a node with intolerably slow connections can offset any benefit to running on a fast node. In this paper, we present accessibility-aware resource selection techniques by which it is possible to choose nodes that will have efficient data access to remote data sources. We have that the local data access observations collected from a node's neighbors are sufficient to characterize accessibility for that node. We then present resource selection heuristics guided by this principle, and show that they significantly outperform standard techniques. We also investigate the impact of churn in which nodes change their status of participation such that they lose their memory of prior observations. Despite this level of unreliability, we show that the suggested techniques yield good results.en-USAccessibility-based Resource Selection in Loosely-coupled Distributed SystemsReport