Reputation-Based Scheduling on Unreliable Distributed Infrastructures

Sonnek, JasonNathan, MukeshChandra, AbhishekWeissman, Jon2020-09-022020-09-022005-11-21https://hdl.handle.net/11299/215677This paper presents a design and analysis of scheduling techniques to cope with the inherent unreliability and instability of worker nodes in large-scale donation-based distributed infrastructures such as P2P and Grid systems. In particular, we focus on nodes that execute tasks via donated computational resources and may behave erratically or maliciously. We present a model in which reliability is not a binary property but a statistical one based on a node's prior performance and behavior. We use this model to construct several reputation-based scheduling algorithms that employ estimated reliability ratings of worker nodes for efficient task allocation. Our scheduling algorithms are designed to adapt to changing system conditions as well as non-stationary behavior of node reliability. Through simulation of a BOINC-like distributed computing infrastructure, we demonstrate that our algorithms can significantly improve throughput, while maintaining a very high success rate of task completion. Our results also indicate that reputation-based scheduling can handle wide variety of worker populations, including non-stationary behavior, with overhead that scales well with system size.en-USReputation-Based Scheduling on Unreliable Distributed InfrastructuresReport