Browsing by Author "Packirisamy, Venkatesan"

Now showing 1 - 4 of 4

Efficiency of Thread-Level Speculation in SMT and CMP Architectures - Performance, Power and Thermal Perspective
(2008-06-13) Packirisamy, Venkatesan; Luo, Yangchun; Hung, Wei-Lung; Zhai, Antonia; Yew, Pen-Chung
Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000.s. However, because of the lack of compilers and other related software technologies, most of the general-purpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of the Energy-Delay-Squared product, SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.
Exploring efficient architecture design for thread-level speculation---Power and performance perspectives.
(2009-06) Packirisamy, Venkatesan
With the advent of multi-threaded (e.g. simultaneous multi-threading (SMT) ) and/or multi-core (e.g. chip multiprocessors (CMP) [3, 4]) architectures, now the challenge is to utilize these architectures to improve performance of general-purpose applications. However, traditional parallelizing compilers often fail to effectively parallelize general-purpose applications which typically have complex control flow and excessive pointer usage. Thread-Level Speculation (TLS) have been proposed to simplify the task of parallelization by using speculative threads. Though the performance of TLS has been studied in the past, its power consumption, power efficiency and thermal behavior are not well understood. Also previous work on TLS have concentrated on multi-core based architectures and relatively little has been done on supporting TLS on multi-threaded architectures. With increasing multi-threaded/multi-core design choices, it is important to understand the benefits of the different types of architectures. The goal of this dissertation is to explore architecture techniques to efficiently implement TLS in future multi-threaded/multi-core processors. The dissertation proposes a novel cache-based architecture to support TLS in multi-threaded SMT architecture. A detailed study on the efficiency of different TLS architectures was conducted by comparing their performance, power and thermal characteristics. To improve efficiency, the dissertation proposes a novel SMT-CMP based heterogeneous architecture which combines the advantages of both SMT and CMP architectures. The dissertation also proposes novel architecture and compiler techniques to efficiently extract speculative parallelism from multiple loop levels.
Exploring Speculative Parallelism in SPEC2006
(2008-11-04) Packirisamy, Venkatesan; Zhai, Antonia; Yew, Pen-Chung
Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000's. It was hoped that the continuous improvement of single-program performance could be achieved through these architectures. However, traditional parallelizing compilers often fail to effectively parallelize general-purpose applications which typically have complex control flow and excessive pointer usage. Recently hardware techniques like Transactional Memory (TM) and Thread-Level Speculation (TLS) have been proposed to simplify the task of parallelization by using speculative threads. Potential of speculative parallelism in general-purpose applications like SPEC CPU 2000 have been well studied and have shown to be moderately successful. Preliminary work that examined the potential parallelism in SPEC2006 deployed parallel threads with a restrictive TLS execution model and limited compiler support, and thus showed only limited performance potential. In this paper, we first analyze the cross-iteration dependence behavior of SPEC 2006 benchmarks and show that more parallelism potential is available in SPEC 2006 benchmarks, comparing against SPEC2000. Further, we use a state-of-the-art profile-driven TLS compiler to identify loops that can be speculatively parallelized. Overall, we found an average speedup of 60% on four cores over what could be achieved by a traditional parallelizing compiler such as Intel.s ICC compiler on such benchmarks. We also found that an additional 11% improvement could be obtained on selected benchmarks using 8 cores when we extend TLS on multiple loop levels as opposed to restricting TLS only on a single loop level.
Performance and power comparison of Thread Level Speculation in SMT and CMP architectures
(2007-10-30) Packirisamy, Venkatesan; Zhai, Antonia; Hsu, Wei-Chung; Yew, Pen-Chung
As technology advances, microprocessors that support multiple threads of execution on a single chip are becoming increasingly common. Improving the performance of general purpose applications by extracting parallel threads is extremely difficult, due to the complex control flow and ambiguous data dependences that are inherent to these applications. Thread-Level Speculation (TLS) enables speculative parallel execution of potentially dependent threads, and ensures correct execution by providing hardware support to detect data dependence violations and to recover from speculation failures. TLS can be supported on a variety of architectures, among them are Chip MultiProcessors (CMP) and Simultaneous MultiThreading(SMT). While there have been numerous papers comparing the performance and power efficiency of SMT and CMP processors under various workloads, relatively little has been done to compare them under the context of TLS. While CMPs utilize smaller and more power-efficient cores, resource sharing and constructive interference between speculative and non-speculative threads can potentially make SMT more power efficient. Thus, this paper aims to fill this void by extending a CMP and a SMT processor to support TLS, and evaluating the performance and power efficiency of the resulting systems with speculative parallel threads extracted for the SPEC2000 benchmark suite. Both SMT and CMP processors have a large variety of configurations, we choose to conduct our study on two architectures with equal die area and the same clock frequency. Our results show that a SMT processor that supports four speculative threads outperforms a CMP processor that supports the same number of threads, uses the same die area and operates at the same clock frequency by 23% while consuming only 8% more power on selected SPEC2000 benchmarks. In terms of energy-delay product, the same SMT processor is approximately 10% more efficient than the CMP processor.

University Digital Conservancy

Browse by Author

Browsing by Author "Packirisamy, Venkatesan"