Many applications must ingest rapid streams of data and produce analytics results in
near-real-time. Whether the input streams represent sensor data from smart homes, user
interaction logs from streaming video clients, or server logs from a content delivery
network (CDN), it is common for such streams to originate from geographically
distributed sources. The typical infrastructure for processing these geo-distributed streams
follows a hub-and-spoke model, where several edge resources perform partial computation
before forwarding results over a wide-area network (WAN) to a central location for final
processing. Due to limited WAN bandwidth, it is not always possible to produce exact results
in near-real-time. When this is the case, applications must either sacrifice timeliness by
allowing delayed---and in turn stale---results, or sacrifice accuracy by allowing some error
in final results. In this paper, we focus on windowed grouped aggregation, an important and
widely used primitive in streaming analytics, and we study the tradeoff between the key
metrics of staleness and error. We present optimal offline algorithms for minimizing
staleness under an error constraint and for minimizing error under a staleness constraint.
Using these offline algorithms as references, we present practical online algorithms for
effectively trading off timeliness and accuracy in the face of bandwidth limitations.
Using a workload derived from a web analytics service offered by a large commercial CDN,
we demonstrate the effectiveness of our techniques through a trace-driven simulation.
Our results show that our proposed algorithms outperform several baseline algorithms
for a range of error and staleness bounds, for a variety of aggregation functions
under different network bandwidth constraints.
Heintz, Benjamin; Chandra, Abhishek; Sitaraman, Ramesh K..
Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.