Ahmed, Amr2017-10-092017-10-092017-06https://hdl.handle.net/11299/190480University of Minnesota Ph.D. dissertation. June 2017. Major: Computer Science. Advisor: Mohamed Mokbel. 1 computer file (PDF); vii, 139 pages.Developers, researchers, and practitioners have been building a myriad of applications to analyze microblogs data, e.g., tweets, online reviews, and user comments. Examples of such applications include citizen journalism, events detection and analysis, geo-targeted advertising, medical research, and studying social influences in social sciences. Building such applications require data management infrastructure to deal with microblogs, including data digestion, indexing, and main-memory management. The lack of such infrastructure hinders the scalability and the widespread of such applications especially among users who are not computer scientists. This thesis proposes Kite; an end-to-end system that is able to manage microblogs data at a large scale. Using Kite, developers and practitioners can simply write SQL-like queries without worrying about the internal data management issues. Internally, Kite is equipped with scalable indexing and main-memory management techniques to support top-k temporal, spatial, keyword, and trending queries on both very recent data and historical data. Kite indexer supports scalable digestion and retrieval for incoming fast data in real time. Recent data are digested in efficient main-memory index structures. Kite in-memory index structure are able to scale up a single machine indexing capabilities to handle the overwhelming amount of data in real time. Meanwhile, Kite memory manager is monitoring the memory contents and smartly decides on which data is regularly moved to disk. This is accomplished through effective memory flushing policies that are designed for top-k query workloads, which are popular on microblogs data. Both in-memory and in-disk data are queried seamlessly through efficient retrieval techniques that are encapsulated in Kite query processor. The query processor exploits the top-k ranking function to early prune the search space and reduce the query latency significantly. Kite is open-sourced and available to the community to build on (http://kite.cs.umn.edu). Extensive experimentation on different Kite components show the efficiency and the effectiveness of the proposed techniques to manage microblogs data at scale.enDataIndexMemory managementMicroblogsQuery ProcessingTwitterKite: A Scalable Microblogs Data Management SystemThesis or Dissertation