Browsing by Author "Das, Abhinav"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Issues and Support for Dynamic Register Allocation(2006-06-21) Das, Abhinav; Fu, Rao; Zhai, Antonia; Hsu, Wei-ChungPost-link and dynamic optimizations have become important to achieve program performance. This is because, it is difficult to produce a single binary that fits all micro-architectures and provides good performance for all inputs. A major challenge in post-link and dynamic optimizations is the acquisition of registers for inserting optimization code with the main program. We show that it is difficult to achieve both correctness and transparency when only software schemes for acquiring registers are used. We then propose an architecture feature that builds upon existing hardware for stacked register allocation on the Itanium processor. The hardware impact of this feature is minimal, while simultaneously allowing post-link and dynamic optimization systems to obtain registers for optimization in a "safe" manner, thus preserving the transparency and improving the performance of these systems.Item Performance of Runtime Optimization on BLAST(2004-10-15) Das, Abhinav; Lu, Jiwei; Chen, Howard; Kim, Jinpyo; Yew, Pen-Chung; Hsu, Wei-Chung; Chen, Dong-yuanOptimization of a real world application BLAST is used to demonstrate the limitations of static and profile-guided optimizations and to highlight the potential of runtime optimization systems. We analyze the performance profile of this application to determine performance bottlenecks and evaluate the effect of aggressive compiler optimizations on BLAST. We find that applying common optimizations (e.g. O3) can degrade performance. Profile guided optimizations do not show much improvement across the board, as current implementations do not address critical performance bottlenecks in BLAST. In some cases, these optimizations lower performance significantly due to unexpected secondary effects of aggressive optimizations. We also apply runtime optimization to BLAST using the ADORE framework. ADORE speeds up some queries by as much as 58% using data cache prefetching. Branch mispredictions can also be significant for some input sets. Dynamic optimization techniques to improve branch prediction accuracy are described and examined for the application. We find that the primary limitation to the application of runtime optimization for branch misprediction is the tight coupling between data and dependent branch. With better hardware support for influencing branch prediction, a runtime optimizer may deploy optimizations to reduce branch misprediction stalls.Item PerfView: A Performance Monitoring and Visualization Tool for Intel Itanium Architecture(2004-07-27) Lingamneni, Ananth; Das, Abhinav; Hsu, Wei-ChungApplication performance analysis in modern microprocessors has become extremely complex due to substantial instruction level parallelism, complex processor pipelines and deep memory hierarchies. Performance analysts need to have a thorough understanding of the dynamic behavior of programs in order to identify and fix performance bottlenecks. In order to help in the performance analysis process, modern day processors provide hardware support in the form of performance registers that capture micro-architectural events at program runtime. However, the data provided by these hardware registers is at a very low level and an extensive effort has to be made by performance analysts to make sense of the data. Therefore, it is extremely beneficial to make use of performance analysis tools that can assemble various types of performance related information available from the performance registers to provide a high level summary of the data. In this paper, we discuss PerfView, which is (1) a source-code based visualization tool (2) a tool that identifies and allows users to view performance-critical events based on execution paths and (3) an interactive performance monitoring and debugging tool.