Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics

Loading...
Thumbnail Image

View/Download File

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics

Alternative title

Published Date

2016-03-03

Publisher

Type

Report

Abstract

Many applications must ingest rapid streams of data and produce analytics results in near-real-time. Whether the input streams represent sensor data from smart homes, user interaction logs from streaming video clients, or server logs from a content delivery network (CDN), it is common for such streams to originate from geographically distributed sources. The typical infrastructure for processing these geo-distributed streams follows a hub-and-spoke model, where several edge resources perform partial computation before forwarding results over a wide-area network (WAN) to a central location for final processing. Due to limited WAN bandwidth, it is not always possible to produce exact results in near-real-time. When this is the case, applications must either sacrifice timeliness by allowing delayed---and in turn stale---results, or sacrifice accuracy by allowing some error in final results. In this paper, we focus on windowed grouped aggregation, an important and widely used primitive in streaming analytics, and we study the tradeoff between the key metrics of staleness and error. We present optimal offline algorithms for minimizing staleness under an error constraint and for minimizing error under a staleness constraint. Using these offline algorithms as references, we present practical online algorithms for effectively trading off timeliness and accuracy in the face of bandwidth limitations. Using a workload derived from a web analytics service offered by a large commercial CDN, we demonstrate the effectiveness of our techniques through a trace-driven simulation. Our results show that our proposed algorithms outperform several baseline algorithms for a range of error and staleness bounds, for a variety of aggregation functions under different network bandwidth constraints.

Keywords

Description

Related to

Replaces

License

Series/Report Number

Technical Report; 16-003

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Heintz, Benjamin; Chandra, Abhishek; Sitaraman, Ramesh K.. (2016). Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215988.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.