Step caches - a novel approach to concurrent memory access on shared memory MP-SOCs

doi:10.1109/NORCHP.2005.1596992

Proceedings ArticleDOI

Step caches - a novel approach to concurrent memory access on shared memory MP-SOCs

- pp 74-77

TLDR

According to the evaluation, step caches speed up execution by a factor close to the number of processors in respect to the similar system without step caches and almost achieve the performance of the ideal shared memory systems in plain concurrent access.

Abstract:

In this paper we introduce a novel class of caches, named step caches, that can be used to implement concurrent memory access in shared memory multithreaded multiprocessor systems on chip (MP-SOC) without cache coherency problems. The main difference between ordinary caches and steps caches is that data entered to a step cache is kept valid only until the end of ongoing step of multithreaded execution. We describe the structure and operation of step caches as well as give a performance evaluation of step cache systems with different settings using simple parallel programs on our paramedical MP-SOC framework. According to the evaluation, step caches speed up execution by a factor close to the number of processors in respect to the similar system without step caches and almost achieve the performance of the ideal shared memory systems in plain concurrent access.

Citations

PDF

Open Access

More filters

Book ChapterDOI

TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine

Martti Forsell

TL;DR: It is hard to imagine that parallel computing would be able to continue the increasing trend of computational performance without solving problems caused by inability of current architectures to hide the latency of shared memory accesses, lack of synchronicity in execution of computational threads as well as too weak models and low-level primitives of parallel computing.

...read moreread less

Proceedings ArticleDOI

Realizing Multioperations for Step Cached MP-SOCs

M. Forsell

TL;DR: An architectural technique for implementing multioperations on step cached MP-SOCs even if the associativity of caches is limited is proposed, based on simple active memory units, faster memory modules, and small processor-level memory blocks called scratchpads.

...read moreread less

Journal ArticleDOI

On the performance and cost of some pram models on cmp hardware

Martti Forsell

- 01 Jun 2010 -

International Journal of Foundations of ...

TL;DR: This paper measures the performance and estimates the cost of practical implementations of four PRAM models including EREW, Limited Arbitrary CRCW, Full arbitrary CRCw, Full Arbitrary Multioperation CRCW on the Eclipse chip multiprocessor framework and concludes that the most powerful model shows the lowest relative cost and highest performance/area and performance/power figures.

...read moreread less

Proceedings ArticleDOI

REPLICA T7-16-128 — A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor

Martti Forsell, +1 more

TL;DR: A 2048-threaded 16-core prototype of the REPLICA chip multiprocessor is introduced and the main principles of the architecture as well as the structure of the prototype are explained.

...read moreread less

Journal ArticleDOI

Performance comparison of some shared memory organizations for 2D mesh-like NOCs

Martti Forsell

- 01 Mar 2011 -

Microprocessors and Microsystems

TL;DR: This paper compares the performance of some shared memory organizations for chip multiprocessors (CMP) employing advanced homogeneous 2D-mesh-like NOCs and making use of emulated shared memory and non-uniform memory access models.

...read moreread less

References

PDF

Open Access

More filters

Book

Computer Architecture: A Quantitative Approach, 2nd Edition

John L. Hennessy, +1 more

TL;DR: A quantitative approach to computer architecture a quantitative approach 5th edition computer architecture quantitative approach solution manual computer Architecture quantitative approach solutions manual computer architecture an quantitative approach 3rd editionComputer architecture, fifth edition.

...read moreread less

Book

An introduction to parallel algorithms

Joseph JaJa

TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.

...read moreread less

Proceedings ArticleDOI

Cache decay: exploiting generational behavior to reduce cache leakage power

Stefanos Kaxiras, +2 more

TL;DR: This paper discusses policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused, and proposes adaptive policies that effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

...read moreread less

Journal ArticleDOI

How to emulate shared memory

Abhiram Ranade

- 01 Jun 1991 -

Journal of Computer and System Sciences

TL;DR: This work presents a simple algorithm for emulating an N processor CRCW PRAM on an N node butterfly that improves the result of Pippenger by routing permutations with bounded queues in logarithmic time, without the possibility of deadlock.

...read moreread less

Journal ArticleDOI

A scalable high-performance computing solution for networks on chips

M. Forsell

- 01 Sep 2002 -

IEEE Micro

TL;DR: The Eclipse network-on-a-chip architecture uses a sophisticated parallel programming model, realized through multithreaded processors, interleaved memory modules, and a high-capacity interconnection network to support system-on a-chip designs.

...read moreread less