Between Dec 19, 2024 and Jan 2, 2025, datasets can be submitted to DRUM but will not be processed until after the break. Staff will not be available to answer email during this period, and will not be able to provide DOIs until after Jan 2. If you are in need of a DOI during this period, consider Dryad or OpenICPSR. Submission responses to the UDC may also be delayed during this time.
 

Kite: A Scalable Microblogs Data Management System

2017-06
Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Kite: A Scalable Microblogs Data Management System

Published Date

2017-06

Publisher

Type

Thesis or Dissertation

Abstract

Developers, researchers, and practitioners have been building a myriad of applications to analyze microblogs data, e.g., tweets, online reviews, and user comments. Examples of such applications include citizen journalism, events detection and analysis, geo-targeted advertising, medical research, and studying social influences in social sciences. Building such applications require data management infrastructure to deal with microblogs, including data digestion, indexing, and main-memory management. The lack of such infrastructure hinders the scalability and the widespread of such applications especially among users who are not computer scientists. This thesis proposes Kite; an end-to-end system that is able to manage microblogs data at a large scale. Using Kite, developers and practitioners can simply write SQL-like queries without worrying about the internal data management issues. Internally, Kite is equipped with scalable indexing and main-memory management techniques to support top-k temporal, spatial, keyword, and trending queries on both very recent data and historical data. Kite indexer supports scalable digestion and retrieval for incoming fast data in real time. Recent data are digested in efficient main-memory index structures. Kite in-memory index structure are able to scale up a single machine indexing capabilities to handle the overwhelming amount of data in real time. Meanwhile, Kite memory manager is monitoring the memory contents and smartly decides on which data is regularly moved to disk. This is accomplished through effective memory flushing policies that are designed for top-k query workloads, which are popular on microblogs data. Both in-memory and in-disk data are queried seamlessly through efficient retrieval techniques that are encapsulated in Kite query processor. The query processor exploits the top-k ranking function to early prune the search space and reduce the query latency significantly. Kite is open-sourced and available to the community to build on (http://kite.cs.umn.edu). Extensive experimentation on different Kite components show the efficiency and the effectiveness of the proposed techniques to manage microblogs data at scale.

Description

University of Minnesota Ph.D. dissertation. June 2017. Major: Computer Science. Advisor: Mohamed Mokbel. 1 computer file (PDF); vii, 139 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Ahmed, Amr. (2017). Kite: A Scalable Microblogs Data Management System. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/190480.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.