scispace - formally typeset
Proceedings ArticleDOI

Step caches - a novel approach to concurrent memory access on shared memory MP-SOCs

M.J. Forsell
- pp 74-77
TLDR
According to the evaluation, step caches speed up execution by a factor close to the number of processors in respect to the similar system without step caches and almost achieve the performance of the ideal shared memory systems in plain concurrent access.
Abstract
In this paper we introduce a novel class of caches, named step caches, that can be used to implement concurrent memory access in shared memory multithreaded multiprocessor systems on chip (MP-SOC) without cache coherency problems. The main difference between ordinary caches and steps caches is that data entered to a step cache is kept valid only until the end of ongoing step of multithreaded execution. We describe the structure and operation of step caches as well as give a performance evaluation of step cache systems with different settings using simple parallel programs on our paramedical MP-SOC framework. According to the evaluation, step caches speed up execution by a factor close to the number of processors in respect to the similar system without step caches and almost achieve the performance of the ideal shared memory systems in plain concurrent access.

read more

Citations
More filters
Book ChapterDOI

TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine

TL;DR: It is hard to imagine that parallel computing would be able to continue the increasing trend of computational performance without solving problems caused by inability of current architectures to hide the latency of shared memory accesses, lack of synchronicity in execution of computational threads as well as too weak models and low-level primitives of parallel computing.
Proceedings ArticleDOI

Realizing Multioperations for Step Cached MP-SOCs

M. Forsell
TL;DR: An architectural technique for implementing multioperations on step cached MP-SOCs even if the associativity of caches is limited is proposed, based on simple active memory units, faster memory modules, and small processor-level memory blocks called scratchpads.
Journal ArticleDOI

On the performance and cost of some pram models on cmp hardware

TL;DR: This paper measures the performance and estimates the cost of practical implementations of four PRAM models including EREW, Limited Arbitrary CRCW, Full arbitrary CRCw, Full Arbitrary Multioperation CRCW on the Eclipse chip multiprocessor framework and concludes that the most powerful model shows the lowest relative cost and highest performance/area and performance/power figures.
Proceedings ArticleDOI

REPLICA T7-16-128 — A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor

TL;DR: A 2048-threaded 16-core prototype of the REPLICA chip multiprocessor is introduced and the main principles of the architecture as well as the structure of the prototype are explained.
Journal ArticleDOI

Performance comparison of some shared memory organizations for 2D mesh-like NOCs

TL;DR: This paper compares the performance of some shared memory organizations for chip multiprocessors (CMP) employing advanced homogeneous 2D-mesh-like NOCs and making use of emulated shared memory and non-uniform memory access models.
References
More filters
Book

Computer Architecture: A Quantitative Approach, 2nd Edition

TL;DR: A quantitative approach to computer architecture a quantitative approach 5th edition computer architecture quantitative approach solution manual computer Architecture quantitative approach solutions manual computer architecture an quantitative approach 3rd editionComputer architecture, fifth edition.
Book

An introduction to parallel algorithms

TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.
Proceedings ArticleDOI

Cache decay: exploiting generational behavior to reduce cache leakage power

TL;DR: This paper discusses policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused, and proposes adaptive policies that effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.
Journal ArticleDOI

How to emulate shared memory

TL;DR: This work presents a simple algorithm for emulating an N processor CRCW PRAM on an N node butterfly that improves the result of Pippenger by routing permutations with bounded queues in logarithmic time, without the possibility of deadlock.
Journal ArticleDOI

A scalable high-performance computing solution for networks on chips

M. Forsell
- 01 Sep 2002 - 
TL;DR: The Eclipse network-on-a-chip architecture uses a sophisticated parallel programming model, realized through multithreaded processors, interleaved memory modules, and a high-capacity interconnection network to support system-on a-chip designs.