Browsing by Author "Luo, Yangchun"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Efficiency of Thread-Level Speculation in SMT and CMP Architectures - Performance, Power and Thermal Perspective(2008-06-13) Packirisamy, Venkatesan; Luo, Yangchun; Hung, Wei-Lung; Zhai, Antonia; Yew, Pen-ChungComputer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000.s. However, because of the lack of compilers and other related software technologies, most of the general-purpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of the Energy-Delay-Squared product, SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.Item Exploiting parallelism in multicore processors through dynamic optimizations.(2011-11) Luo, YangchunEfficiently utilizing multi-core processors to improve their performance potentials demands extracting thread-level parallelism from the applications. Various novel and sophisticated execution models have been proposed to extract thread-level parallelism from sequential programs. One such execution model, Thread-Level Speculation (TLS), allows potentially dependent threads to execute speculatively in parallel. However, TLS execution is inherently unpredictable, and consequently incorrect speculation could degrade performance and/or energy efficiency for the multi-core systems. To address these issues, this dissertation proposes dynamic optimizations that exploit the benefit of successful speculations, while minimizing the impact of failed speculations. First, we propose optimizations to dynamically determine where TLS should be applied in the original sequential program, whereas prior works have focused on using the compiler to statically select program regions. Our research shows that even the state-of-the-art compiler makes suboptimal decisions, due to the unpredictability of TLS execution. In this dissertation, speculative threads are monitored using the hardwarebased counters and their performance impact is dynamically evaluated. Performance tuning policies are devised to adjust the behaviors of speculative threads accordingly. Dynamic performance tuning naturally allows the system to adapt to many program behaviors that are runtime dependent. Second, we propose a heterogeneous multi-core architecture to support energyefficient TLS. By carefully analyzing the behaviors of standard benchmark workloads, we identify a set of heterogeneous components that diversify in power and performance trade-offs and are also feasible to integrate. We have also devised a competent resource allocation scheme that dynamically monitors the program behavior, analyzes its characteristics, and matches it with the most energy-efficient configuration of the system. Throttling mechanisms are introduced to mitigate the overhead associated with configuration changes. Under the context of TLS, our findings have shown that on-chip heterogeneity and dynamic resource allocation are two key ingredients for achieving performance improvement in an energy-efficient way.