Browsing by Subject "High performance computing"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Data dissemination for distributed computing.(2010-02) Kim, JinohLarge-scale distributed systems provide an attractive scalable infrastructure for network applications. However, the loosely-coupled nature of this environment can make data access unpredictable, and in the limit, unavailable. This thesis strives to provide predictability in data access for data-intensive computing in large-scale computational infrastructures. A key requirement for achieving predictability in data access is the ability to estimate network performance for data transfer so that computation tasks can take advantage of the estimation in their deployment or data source selection. This thesis develops a framework called OPEN (Overlay Passive Estimation of Network Performance) for scalable network performance estimation. OPEN provides an estimation of end-to-end accessibility for applications by utilizing past measurements without the use of explicit probing. Unlike existing passive approaches, OPEN is not restricted to pairwise or a single network in utilizing historical information; instead, it shares measurements between nodes without any restrictions. As a result, it achieves n2 estimations by O(n) measurements. In addition, this thesis considers data dissemination in two specific environments. First, we consider a parallel data access environment in which multiple replicated servers can be utilized to download a single data file in parallel. To improve both performance and fault tolerance, we present a new parallel data retrieval algorithm and explore a broad set of resource selection heuristics. Second, we consider collective data access in applications for which group performance is more important than individual performance. In this work, we employ communication makespan as a group performance metric and propose server selection heuristics to maximize collective performance.Item A novel graphical processing unit method for power systems security analysis(2013-06) Miller, Laurie ElizabethThere is an increasing need for computational power to drive software tools used in power systems planning and operations, since the emergence of modern energy markets and recent renewable generation technology fundamentally alters how energy flows through the existing power grid. While special-purpose hardware, including supercomputers, has been explored for this purpose, inexpensive commodity hardware is another way of getting increased computational power within the power systems control centers. Adding General-Purpose Graphical Processing Units (GPGPUs) to the nodes in a control center's existing computational platform is a significantly lower expense than adding an equivalent number of new nodes and the infrastructure to support them. If accelerating computations with GPGPUs can halve the time needed for for a set of contingencies to run on a set of given computational nodes, freeing up crucial minutes for analysis of additional contingencies, the investment can be worth the costs. Yet this would be considered a quite modest speedup for GPGPU computing if the problem is conditioned in a way that maps well to the architecture and programming model of the GPGPU. The novel method for GPGPU contingency analysis and its variants presented in this thesis allows that process of speedup to be taken substantially further, since it re-maps as much of the computation as possible to be a series of dense vector operations based on simple arithmetic that is conservative with respect to data movement and flexible with respect to implementation details such as thread block size. Where sparse matrix operations cannot be avoided, this method, by slicing across contingencies, re-maps such operations to the much more efficient problem of a sparse matrix multiplied by a block of dense vectors larger than the matrix itself. The method applies to (N-1-1), (N-2), and (N-3) contingencies with little modification and little increase in computational burden or data movement per contingency. The method is designed to accommodate systems of thousands to tens of thousands of buses, if need be, with the large power systems resulting from control area consolidation in mind.Item Performance portability strategies for Computational Fluid Dynamics (CFD) applications on HPC systems(2013-06) Lin, Pei-HungAchieving high computational performance on large-scale high performance computing (HPC) system demands optimizations to exploit hardware characteristics. Various optimizations and research strategies are implemented to improve performance with emphasis on single or multiple hardware characteristics. Among these approaches, the domain-specific approach involving domain expertise shows its high potential in achieving high performance and maintaining performance portability. Deep memory hierarchies, single instruction multiple data (SIMD) engines, and multiple processing cores in the latest CPUs pose many challenges to programmers seeking significant fractions of peak performance. Programming for high performance computation using modern CPUs has to address thread-level parallelization on multiple cores, data-level parallelization on SIMD engines, and optimizing memory utilization for the multi-level memories. Using multiple computational nodes with multiple CPUs in each node to scale up the computation without sacrificing performance increases programming burden significantly. As a result, performance portability has become a major challenge to programmers. It is well known that manually tuned programs can assist the compiler to deliver the best performance. However, generating these optimized codes requires deep understanding in application design, hardware architecture, compiler optimizations, and knowledge in the specific domain. Such manually tuning process has to be done for each new hardware design. To address this issue, this dissertation proposes strategies that exploit the advantages of domain-specific optimizations to achieve performance portability. This dissertation shows the combination of the proposed strategies can effectively exploit both the SIMD engine and on-chip memory. High fraction of peak performance can be achieved after such optimizations. The design of the pre-compilation framework makes it possible to automate these optimizations. Adopting the latest compiler techniques to assist domain-specific optimizations has high potential to implement sophisticated and legal transformations. This dissertation provides a preliminary study using polyhedral transformations to implement the proposed optimization strategies. Several obstacles need to be removed to make this technique applicable to large-scale scientific applications. With the research presented in this dissertation and suggested tasks in the future work, the ultimate goal to deliver performance portability with automation is feasible for CFD applications.Item QC3D: A Scalable, High Performance Implementation of the Quasicontinuum Method(2023-08) Whalen, StephenMultiscale methods allow for computer simulations of materials at larger scales, finer detail, and higher speed than traditional simulations can offer. The quasicontinuum method (QC) is a multiscale method, combining atomistic and continuum models with a sharp interface between regions. We present QC3D, a complete three-dimentional implementation of QC for multilattice materials. QC3D uses hybrid parallelization to run efficiently on shared-memory and distributed-memory computing systems, scaling to thousands of compute cores. Two example applications demonstrate QC3D’s effectiveness in correctly capturing material responses to deformation, in less compute time than fully atomistic simulators require.