Browsing by Subject "Compiler"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Efficient dynamic program monitoring on multi-core platforms.(2012-06) He, GuojinSoftware security and reliability have become increasingly important in the mod- ern world. An effective approach to enforcing software security and reliability is to monitor a program’s execution at run time. However, instrumentation-based imple- mentation of a dynamic program monitor on single-core systems suffers significant performance overhead. As multi-core architecture becomes more mainstream, im- plementing efficient dynamic program monitoring by assigning monitoring activities onto separate processor cores and thus reducing performance overhead becomes not only a feasible but an appealing way to enforce software security and reliability. To achieve efficient and flexible multi-core based dynamic program monitoring, however, three challenging issues must be carefully considered and adequately addressed: the hardware support, the monitoring model, and the parallelization of monitoring tasks. This dissertation proposes novel solutions to these challenging problems. The hardware support proposed in this dissertation, which is referred to as extraction logic, selectively extracts execution information from the monitored program and forwards it to a monitor running on a separate CPU core. The extraction logic is generic and configurable by the monitor so that it can support a large spectrum of monitoring tasks. Based on this generic hardware support, this dissertation proposes a novel monitoring model, referred to as the distill-based monitor model. Monitors in this execution model is generated by special compiler supports. The distill-based monitor model is based on the observation that a monitor needs only partial informa- tion from the monitored execution and that of this needed information, some can be easily computed by the monitor from other information that has already been com- municated. We implemented a code generator and optimization techniques to decide which set of information to forward and which set to compute so as to minimize the total execution time of the monitor. This compiler support can optimize a variety of monitors with diverse monitoring requirements, taking as input the control flow graph of the monitored program and the set of monitoring requirements. To parallelize monitoring tasks, this dissertation proposes a novel paralleliza- tion paradigm built on General-purpose Computing on Graphics Processing Unit (GPGPU) architecture. In the following chapters, we first propose a generic, purely software-based GPGPU monitor framework that is flexible enough to support par- allelization of various kinds of monitoring tasks. Furthermore, we propose software- based optimization techniques built on this framework that effectively take advantage of various characteristics of monitoring tasks such as taint-propagation and memory- bug detection, and thus achieve significant performance improvement. This dissertation reports the performance improvement achieved by the proposed monitoring model and parallelization paradigm. Relative to the performance of traditional instrumentation-based monitor for taint-propagation and memory-bug- detection, the proposed compiler support is able to bring down performance overhead by 3.7 times and 2.2 times for SPEC2006INT benchmarks. The proposed GPGPU- based monitor with optimization even achieves more for memory-bug detection, re- ducing performance overhead by 5.2 times.Item A strategy for high performance in computational fluid dynamics(2013-08) Jayaraj, JaganComputational Fluid Dynamics is an important area in scientific computing. The weak scaling of codes is well understood with about two decades of experience using MPI. The recent proliferation of multi- and many-core processors have made the modern nodes compute rich, and the per-node performance has become very crucial for the overall machine performance. However, despite the use of thread programming, obtaining good performance at each core is extremely challenging. The challenges are primarily due to memory bandwidth limitations and difficulties in using the short SIMD engines effectively. This thesis is about the techniques, strategies, and a tool, to improve the in-core performance. Fundamental to the strategy is a hierarchical data layout made of small cubical structures of the problem state called the briquettes. The difficulties in computing the spatial derivatives (also called near neighbor computations in the literature) in a hierarchical data layout are well known, and data blocking is extremely unusual in finite difference codes. This work details how to simplify programming for the new data layout, the inefficiencies of the programming strategy, and how to overcome the inefficiencies.The transformation to eliminate the overheads is called pipeline-for-reuse. It is followed by a storage optimization called maximal array contraction. Both pipeline-for-reuse and maximal array contraction are highly tedious and error-prone. Therefore, we built a source-to-source translator called CFD Builder to automate the transformations using directives. The directive based approach we adopted to enable the transformations eliminates the need for complex analysis, and this work provides the linear time algorithms to perform the transformations under the stated assumptions. The benefits of the briquettes and CFD Builder are demonstrated individually with three different applications on two different architectures and two different compilers. We see up to 6.92x performance improvement with applying both the techniques. This strategy with briquettes and CFD Builder was evaluated against commonly known transformations for data locality and vectorization. Briquettes and pipeline-for-reuse transformations to eliminate the overheads outperforms even the best combination of canonical transformations, for data locality and vectorization, applied manually by up to 2.15x