Minglani, Manas2018-09-212018-09-212018-06https://hdl.handle.net/11299/200297University of Minnesota Ph.D. dissertation.June 2018. Major: Electrical/Computer Engineering. Advisor: David Lilja. 1 computer file (PDF); vii, 100 pages.As we inch towards the future, the storage needs of the world are going to be massive and diversied. To tackle the needs of the next generation, the storage systems are required to be studied and require innovative solutions. These solutions need to solve multitude of issues involving high power consumption of traditional systems, manageability, easy scaling out, and integration into existing systems. Therefore, we need to rethink the new technologies from the ground up. To keep the energy signature under control we devised a new architecture called Storage Processing Unit (SPU). For the modeling of this architecture we incorporate a processing element inside the storage medium to limit the data movement between the storage device and the host processor. This resulted in a hierarchal architecture which required an extensive design space exploration along with in-depth study of the applications. We found this new architecture to provide energy savings from 11-423X and gave performance gains from 4-66X for applications including k-means, Sparse BLAS, and others. Moreover, to understand the diverse nature of the applications and newer technologies, we tried the concept of in-storage processing for unstructured data. This type of data is demonstrating huge amount of growth and would continue to do so. Seagate's new class of drives - Kinetic Drives, address the rise of unstructured data. They have a processing element inside disk drives that execute LevelDB, a key-value store. We evaluated this off-the-shelf device using micro and macro benchmarks for an in-depth throughput and latency benchmarking. We observed sequential write throughput of 63 MB/sec and sequential read throughput of 78 MB/sec for 1 MB value sizes. We tested several unique features including P2P transfer that takes place in a Kinetic Drive. These new class of drives outperformed traditional servers workloads for several test cases. Finally, large number of these devices are needed for huge amounts of data. To demonstrate that Kinetic Drives reduce the management complexity for large-scale deployment, we conducted a study. We allocated large amounts of data on Kinetic Drives and then evaluated the performance of the system for migration of data amongst drives. Previously developed key indexing schemes were evaluated which gave important insights into their performance differences. Based on this study we can conclude that efficient mapping of key-value pairs to drives could be obtained. This lead to an understanding of the trade-offs between the number of empty drives and mapping of different key ranges to different drives. In conclusion, in-storage processing architectures bring an interesting aspect where processing is moved closer to the data. This leads to a paradigm shift which often results in a major software and hardware architectural changes. Furthermore, the new architectures have the potential to perform better than the traditional systems but require easy integration with the existing systems.enin-storage processingleveldbMachine Learning AlgorithmsObject Storage DevicePerformance EvaluationStorage ArchitecturePerformance Evaluation of In-storage Processing Architectures for Diverse Applications and BenchmarksThesis or Dissertation