Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
Alternative title
Published Date
2016-03-03
Publisher
Type
Report
Abstract
Many applications must ingest rapid streams of data and produce analytics results in
near-real-time. Whether the input streams represent sensor data from smart homes, user
interaction logs from streaming video clients, or server logs from a content delivery
network (CDN), it is common for such streams to originate from geographically
distributed sources. The typical infrastructure for processing these geo-distributed streams
follows a hub-and-spoke model, where several edge resources perform partial computation
before forwarding results over a wide-area network (WAN) to a central location for final
processing. Due to limited WAN bandwidth, it is not always possible to produce exact results
in near-real-time. When this is the case, applications must either sacrifice timeliness by
allowing delayed---and in turn stale---results, or sacrifice accuracy by allowing some error
in final results. In this paper, we focus on windowed grouped aggregation, an important and
widely used primitive in streaming analytics, and we study the tradeoff between the key
metrics of staleness and error. We present optimal offline algorithms for minimizing
staleness under an error constraint and for minimizing error under a staleness constraint.
Using these offline algorithms as references, we present practical online algorithms for
effectively trading off timeliness and accuracy in the face of bandwidth limitations.
Using a workload derived from a web analytics service offered by a large commercial CDN,
we demonstrate the effectiveness of our techniques through a trace-driven simulation.
Our results show that our proposed algorithms outperform several baseline algorithms
for a range of error and staleness bounds, for a variety of aggregation functions
under different network bandwidth constraints.
Keywords
Description
Related to
Replaces
License
Series/Report Number
Technical Report; 16-003
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Heintz, Benjamin; Chandra, Abhishek; Sitaraman, Ramesh K.. (2016). Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215988.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.