A Novel GPU Algorithm for Indexing Columnar Databases with Column Imprints

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

A Novel GPU Algorithm for Indexing Columnar Databases with Column Imprints

Published Date

2020-07

Publisher

Type

Thesis or Dissertation

Abstract

Columnar database management systems (CDBMS) are specialized database systems that store data in column-major order, i.e. where all the values of each attribute are stored consecutively in storage, as opposed to row-major order, which stores all the values of each row consecutively and is the most commonly strategy used by relational database systems such as Oracle, SQLServer, PostgreSQL, etc. The column-major order approach makes CDBMS more appropriate for data warehouses because query workloads for the latter systems usually involve retrieving all the values of a small subset of attributes. \par Just like in relational database management systems, query response time performance in CDBMS can greatly benefit from the existence of specially designed data structures, called indexes, that can help avoid exhaustively searching the entire database. However, existing indexing techniques for CDBMS like bitmaps, zonemaps etc. have been shown to result in large storage overhead and memory traffic between storage and CPU. Column imprints are an indexing approach for CDBMS that deals with these two issues by compressing the data in storage and by storing data in such a way that it reduces expensive data transfers between Random Access Memory (RAM) and Central Processing Unit (CPU) caches. However, data compression and decompression, which are necessary for query processing in CDBMS imposes a significant computational burden. To deal with this problem, parallel architectures, such as Graphical Processing Units (GPU) can be used. GPUs are co-processors that have been shown to outperform CPUs for highly parallel tasks. Besides this, GPUs are highly energy efficient, affordable, and available in many types of machines, from mobile devices to supercomputers. Despite their advantages, no work has focused on the use of GPUs for column imprints indexing. To address the gap mentioned before, this thesis introduces the first GPU algorithm, named GPUImprints, for indexing columnar databases with column imprints. We also performed the first experimental study, using one real-world dataset and one synthetic dataset, on the use of GPUs versus CPUs for column imprints in terms of their index creation time, query response times and index space requirements. Our experiments showed that GPUImprints can speedup the column imprints construction time by a factor of 20X and can speedup the query processing times of range queries by a factor of 3.7X when compared to the CPU imprints algorithm.

Keywords

Description

University of Minnesota M.S. thesis. July 2020. Major: Computer Science. Advisor: ELEAZAR LEAL. 1 computer file (PDF); ix, 49 pages.

Related to

Replaces

License

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Mannem, Manaswi. (2020). A Novel GPU Algorithm for Indexing Columnar Databases with Column Imprints. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/216756.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.