The terms dual-core and quad-core have become ubiquitous for modern computer processors. Where processors previously had a single powerful processing core, they now have multiple lower power processing cores. While some applications can readily be adapted to this change in computer architecture, a large class of applications dealing with unstructured data poses significant challenges to achieving performance on these new architectures. In this thesis we propose new algorithms and methods for the problems of graph partitioning, fill reducing ordering, and graph clustering on multicore architectures. These problems are used for the discovery and introduction of structure into this data. This is important in the fields of scientific and parallel computing, data sciences, life sciences, and integrated circuit design. Our new methods for graph partitioning exploit current multicore architectures to greatly reduce both runtime and memory requirements. Our methods rely on a model of coarse grained parallelism while effectively maximizing data locality. We develop new methods for each phase of multilevel graph partitioning including a new aggregation method, an adaptive parallel formulation of initial partitioning, a high speed method for performing greedy refinement, and a new method for high quality refinement which is highly scalable. These new algorithms result in 3x lower runtimes than previous methods, while matching them in terms of quality using the greedy refinement method. Our new high quality and highly scalable refinement method improves the solution quality by 8%. We develop new methods for generating fill reducing orderings of sparse matrices. This includes a new method of vertex separator refinement which manages to achieve effective parallelization while still matching the quality of the best serial algorithms. We introduce a task scheduling method specifically for nested dissection and show that it results in significantly better performance over the task schedulers available in current threading libraries. Overall, our algorithms for nested dissection are 1.5x faster than existing parallel methods and reduce the fill-in by 3.7% and operation count by 14% of Cholesky factorization compared to the orderings produced by other parallel methods. This matches the quality of orderings generated by serial methods while achieving a 10.1x speedup on 16 cores. Finally, we develop new serial and parallel algorithms for the graph clustering problem. We take many of the strategies we used for parallelizing graph partitioning on multicore architectures, and adapt them to the modularity maximization problem. We develop new methods for vertex aggregation, initial clustering, and cluster refinement. Our serial algorithms are an order of magnitude faster than state of the art serial methods while producing clusterings of equal or greater quality. Our parallel algorithms are 4.5--27.2x faster than other parallel methods and achieve 8.9x speedup on 16 cores while exhibiting less than 1% degradation in solution quality compared to their counterparts. Our implementation of these algorithms can cluster a three billion edge graph in under 90 seconds on a single computer.
University of Minnesota Ph.D. dissertation. December 2015. Major: Computer Science. Advisor: George Karypis. 1 computer file (PDF); xiii, 154 pages.
Graph Partitioning, Ordering, and Clustering for Multicore Architectures.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.