scispace - formally typeset
P

Per Stenström

Researcher at Chalmers University of Technology

Publications -  251
Citations -  8514

Per Stenström is an academic researcher from Chalmers University of Technology. The author has contributed to research in topics: Cache & Cache coherence. The author has an hindex of 43, co-authored 245 publications receiving 8193 citations. Previous affiliations of Per Stenström include Stanford University & Ericsson.

Papers
More filters
Proceedings ArticleDOI

Modelling accesses to migratory and producer-consumer characterised data in a shared memory multiprocessor

TL;DR: A set of parameters that characterises the accesses to migratory and producer-consumer data in sufficient detail is identified so as to predict the number of cache misses in directory-based, write-invalidate protocols.
BookDOI

Transactions on High-Performance Embedded Architectures and Compilers I

TL;DR: A High Performance Adaptive Miss Handling Architecture for Chip Multiprocessors and Characterizing Time-Varying Program Behavior Using Phase Complexity Surfaces.
Journal ArticleDOI

Performance evaluation and cost analysis of cache protocol extensions for shared-memory multiprocessors

TL;DR: It is shown that a basic write-invalidate protocol augmented by appropriate extensions can eliminate most memory access penalties without any support from the programmer or the compiler.
Proceedings ArticleDOI

Using hints to reduce the read miss penalty for flat COMA protocols

TL;DR: A new protocol using hints that simultaneously sends a request to the potential holder and to the directory is studied, which reduces the read-miss penalty for all applications but the protocol complexity does not seem to justify the performance improvement.
Journal ArticleDOI

Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols

TL;DR: The static analysis using classical dataflow analysis techniques in removing overhead in write-invalidate cache coherence protocols for shared-memory multiprocessors results in similar performance improvements as dynamic hardware-based approaches.