Kite: A Scalable Microblogs Data Management System
2017-06
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Kite: A Scalable Microblogs Data Management System
Authors
Published Date
2017-06
Publisher
Type
Thesis or Dissertation
Abstract
Developers, researchers, and practitioners have been building a myriad of applications to analyze microblogs data, e.g., tweets, online reviews, and user comments. Examples of such applications include citizen journalism, events detection and analysis, geo-targeted advertising, medical research, and studying social influences in social sciences. Building such applications require data management infrastructure to deal with microblogs, including data digestion, indexing, and main-memory management. The lack of such infrastructure hinders the scalability and the widespread of such applications especially among users who are not computer scientists. This thesis proposes Kite; an end-to-end system that is able to manage microblogs data at a large scale. Using Kite, developers and practitioners can simply write SQL-like queries without worrying about the internal data management issues. Internally, Kite is equipped with scalable indexing and main-memory management techniques to support top-k temporal, spatial, keyword, and trending queries on both very recent data and historical data. Kite indexer supports scalable digestion and retrieval for incoming fast data in real time. Recent data are digested in efficient main-memory index structures. Kite in-memory index structure are able to scale up a single machine indexing capabilities to handle the overwhelming amount of data in real time. Meanwhile, Kite memory manager is monitoring the memory contents and smartly decides on which data is regularly moved to disk. This is accomplished through effective memory flushing policies that are designed for top-k query workloads, which are popular on microblogs data. Both in-memory and in-disk data are queried seamlessly through efficient retrieval techniques that are encapsulated in Kite query processor. The query processor exploits the top-k ranking function to early prune the search space and reduce the query latency significantly. Kite is open-sourced and available to the community to build on (http://kite.cs.umn.edu). Extensive experimentation on different Kite components show the efficiency and the effectiveness of the proposed techniques to manage microblogs data at scale.
Description
University of Minnesota Ph.D. dissertation. June 2017. Major: Computer Science. Advisor: Mohamed Mokbel. 1 computer file (PDF); vii, 139 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Ahmed, Amr. (2017). Kite: A Scalable Microblogs Data Management System. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/190480.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.