Exploring Fine-Grained Process Interaction in Multiprocessor Systems

Loading...
Thumbnail Image

View/Download File

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Exploring Fine-Grained Process Interaction in Multiprocessor Systems

Published Date

1997

Publisher

Type

Report

Abstract

Several techniques have been used to improve the perfonnance of process interaction in finegrained multiprocessor systems. These existing techniques tend to have long memory latencies or synchronization times, or they require complex and expensive hardware. This thesis proposes that user-level hardware and special-purpose communications channels for different interaction domains can dramatically improve access performance with relatively modest hardware cost. The thesis characterizes some specific domains for which the hypothesis holds. New lock and barrier mechanisms are presented that reduce both contention and latency to the minimum values that can be obtained using shared-bus communications, requiring at most two shared-bus transactions, with one transaction being typical. Distributed hardware locking queues and barrier flags reduce the latency for process continuation _after obtaining a lock or reaching a barrier to near zero. Four additional interaction mechanisms that use serial communication between processing elements (PEs) in a manner that eliminates inter-PE clocking delays are presented. All of these new techniques increase scalability, are applicable to both new architectures and to existing systems, and are less complex than other hardware solutions. The optimum two-dimensional cluster size for N PEs is shown to be proportional to (Nl/D) where/ and Dare the mean inter-node times, including gate and time-of-flight, on the global and local loops, respectively. · The access latency when optimally clustered is shown to be proportional to (NID)''. Using conservative parameters when optimally clustered, the maximum number of PEs for expected latencies of one microsecond are: 15621 PEs for barriers, 61308 PEs for locks, 37698 for shared-data, and 14592 PEs for shared-registers. All mechanisms are shown to have near-optimum performance if the configuration is near-optimum for any particular mechanism. Hierarchies beyond two levels were shown to have expected latencies proportional to the sum of all loop-times.

Keywords

Description

Related to

Replaces

License

Series/Report Number

Technical Report; 97-020

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Johnson, Donald E.. (1997). Exploring Fine-Grained Process Interaction in Multiprocessor Systems. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215302.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.