Exploring Fine-Grained Process Interaction in Multiprocessor Systems
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Exploring Fine-Grained Process Interaction in Multiprocessor Systems
Authors
Published Date
1997
Publisher
Type
Report
Abstract
Several techniques have been used to improve the perfonnance of process interaction in finegrained
multiprocessor systems. These existing techniques tend to have long memory
latencies or synchronization times, or they require complex and expensive hardware. This
thesis proposes that user-level hardware and special-purpose communications channels for
different interaction domains can dramatically improve access performance with relatively
modest hardware cost. The thesis characterizes some specific domains for which the
hypothesis holds. New lock and barrier mechanisms are presented that reduce both
contention and latency to the minimum values that can be obtained using shared-bus
communications, requiring at most two shared-bus transactions, with one transaction being
typical. Distributed hardware locking queues and barrier flags reduce the latency for process
continuation _after obtaining a lock or reaching a barrier to near zero. Four additional
interaction mechanisms that use serial communication between processing elements (PEs)
in a manner that eliminates inter-PE clocking delays are presented. All of these new
techniques increase scalability, are applicable to both new architectures and to existing
systems, and are less complex than other hardware solutions. The optimum two-dimensional
cluster size for N PEs is shown to be proportional to (Nl/D) where/ and Dare the mean
inter-node times, including gate and time-of-flight, on the global and local loops,
respectively. · The access latency when optimally clustered is shown to be proportional to
(NID)''. Using conservative parameters when optimally clustered, the maximum number
of PEs for expected latencies of one microsecond are: 15621 PEs for barriers, 61308 PEs for
locks, 37698 for shared-data, and 14592 PEs for shared-registers. All mechanisms are shown
to have near-optimum performance if the configuration is near-optimum for any particular
mechanism. Hierarchies beyond two levels were shown to have expected latencies
proportional to the sum of all loop-times.
Keywords
Description
Related to
Replaces
License
Series/Report Number
Technical Report; 97-020
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Johnson, Donald E.. (1997). Exploring Fine-Grained Process Interaction in Multiprocessor Systems. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215302.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.