Browsing by Subject "cache coherence"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Compiler Analysis for Cache Coherence: Interprocedural Array Data-Flow Analysis and Its Impacts on Cache Performance(1997) Choi, Lynn; Yew, Pen-ChungIn this paper, we present compiler algorithms for detecting references to stale data in sharedmemory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserning analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms (8] for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms (9]. We have fully implemented the algorithm in the Polaris parallelizing compiler (27). Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implement cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D (20].Item Hardware and Compiler-Directed Cache Coherence in Large-Scale Multiprocessors(1997) Choi, Lynn; Yew, Pen-ChungIn this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. The scheme can be adapted to various cache organizations, including multi-word cache lines and byte-addressable architectures. Several system related issues, including critical sections, inter-thread communication, and task migration have also been addressed. The cost of the required hardware support is minimal and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data flow analysis, have been implemented on the Polaris parallelizing compiler [33]. From our simulation study using the Perfect Club benchmarks [5], we found that in spite of the conservative analysis made by the compiler, the performance of the proposed HSCD scheme can be comparable to that of a full-map hardware directory scheme. Given its comparable performance and reduced hardware cost, the proposed scheme can be a viable alternative for large-scale multiprocessors such as the Cray T3D, which rely on users to maintain data coherence.