Showing papers by "Moinuddin K. Qureshi published in 2010"

PDF

Open Access

Proceedings Article•DOI•

Improving read performance of Phase Change Memories via Write Cancellation and Write Pausing

[...]

Moinuddin K. Qureshi¹, Michele M. Franceschini¹, Luis A. Lastras-Montano¹•Institutions (1)

01 Apr 2010

TL;DR: This work proposes adaptive Write Cancellation policies, which can abort the processing of a scheduled write requests if a read request arrives to the same bank within a predetermined period, and Write Pausing, which exploits the iterative write algorithms used in PCM to pause at the end of each write iteration to service any pending reads.

...read moreread less

Abstract: Phase Change Memory (PCM) is emerging as a promising technology to build large-scale main memory systems in a cost-effective manner. A characteristic of PCM is that it has write latency much higher than read latency. A higher write latency can typically be tolerated using buffers. However, once a write request is scheduled for service to a bank, it can still cause increased latency for later arriving read requests to the same bank. We show that for the baseline PCM system with read-priority scheduling, the write requests increase the effective read latency to 2.3x (on average), causing significant performance degradation. To reduce the read latency of PCM devices under such scenarios, we propose adaptive Write Cancellation policies. Such policies can abort the processing of a scheduled write requests if a read request arrives to the same bank within a predetermined period. We also propose Write Pausing, which exploits the iterative write algorithms used in PCM to pause at the end of each write iteration to service any pending reads. For the baseline system, the proposed technique removes 75% of the latency increase incurred by read requests and improves overall system performance by 46% (on average), while requiring negligible hardware and simple extensions to PCM controller.

...read moreread less

313 citations

Proceedings Article•DOI•

Morphable memory system: a robust architecture for exploiting multi-level phase change memories

[...]

Moinuddin K. Qureshi¹, Michele M. Franceschini¹, Luis A. Lastras-Montano¹, John P. Karidis¹•Institutions (1)

IBM¹

19 Jun 2010

TL;DR: MMS as discussed by the authors is a robust architecture for efficiently incorporating MLC PCM devices in main memory, based on observation that memory requirement varies between workloads, and systems are typically over-provisioned in terms of memory capacity.

...read moreread less

Abstract: Phase Change Memory (PCM) is emerging as a scalable and power efficient technology to architect future main memory systems. The scalability of PCM is enhanced by the property that PCM devices can store multiple bits per cell. While such Multi-Level Cell (MLC) devices can offer high density, this benefit comes at the expense of increased read latency, which can cause significant performance degradation. This paper proposes Morphable Memory System (MMS), a robust architecture for efficiently incorporating MLC PCM devices in main memory. MMS is based on observation that memory requirement varies between workloads, and systems are typically over-provisioned in terms of memory capacity. So, during a phase of low memory usage, some of the MLC devices can be operated at fewer bits per cell to obtain lower latency. When the workload requires full memory capacity, these devices can be restored to high density MLC operation to have full main-memory capacity. We provide the runtime monitors, the hardware-OS interface, and the detailed mechanism for implementing MMS. Our evaluations on an 8-core 8GB MLC PCM-based system show that MMS provides, on average, low latency access for 95% of all memory requests, thereby improving overall system performance by 40%.

...read moreread less

211 citations

Proceedings Article•DOI•

Feedback-directed pipeline parallelism

[...]

M. Aater Suleman¹, Moinuddin K. Qureshi², Khubaib¹, Yale N. Patt¹•Institutions (2)

University of Texas at Austin¹, IBM²

11 Sep 2010

TL;DR: Feedback-Directed Pipelining (FDP) is proposed, a software framework that chooses the core-to-stage allocation at run-time and first maximizes the performance of the workload and then saves power by reducing the number of active cores, without impacting performance.

...read moreread less

Abstract: Extracting high performance from Chip Multiprocessors requires that the application be parallelized. A common software technique to parallelize loops is pipeline parallelism in which the programmer/compiler splits each loop iteration into stages and each stage runs on a certain number of cores. It is important to choose the number of cores for each stage carefully because the core-to-stage allocation determines performance and power consumption. Finding the best core-to-stage allocation for an application is challenging because the number of possible allocations is large, and the best allocation depends on the input set and machine configuration. This paper proposes Feedback-Directed Pipelining (FDP), a software framework that chooses the core-to-stage allocation at run-time. FDP first maximizes the performance of the workload and then saves power by reducing the number of active cores, without impacting performance. Our evaluation on a real SMP system with two Core2Quad processors (8 cores) shows that FDP provides an average speedup of 4.2x which is significantly higher than the 2.3x speedup obtained with a practical profile-based allocation. We also show that FDP is robust to changes in machine configuration and input set.

...read moreread less

62 citations

Journal Article•DOI•

Accelerating Critical Section Execution with Asymmetric Multicore Architectures

[...]

M. Aater Suleman¹, Onur Mutlu², Moinuddin K. Qureshi³, Yale N. Patt¹•Institutions (3)

University of Texas at Austin¹, Carnegie Mellon University², IBM³

01 Jan 2010-IEEE Micro

TL;DR: The proposed accelerated critical sections mechanism reduces this limitation by executing critical sections on the high-performance core of an asymmetric chip multiprocessor, which can execute them faster than the smaller cores can.

...read moreread less

Abstract: Contention for critical sections can reduce performance and scalability by causing thread serialization. The proposed accelerated critical sections mechanism reduces this limitation. ACS executes critical sections on the high-performance core of an asymmetric chip multiprocessor (ACMP), which can execute them faster than the smaller cores can.

...read moreread less

29 citations

Patent•

Adaptive Wear Leveling via Monitoring the Properties of Memory Reference Stream

[...]

Michele M. Franceschini¹, John P. Karidis¹, Luis A. Lastras-Montano¹, Moinuddin K. Qureshi¹•Institutions (1)

IBM¹

19 Nov 2010

TL;DR: In this article, a write data stream is detected and a write leveling process is adapted in response to the detected property, and the write line addresses are generated from the detected properties.

...read moreread less

Abstract: Adaptive write leveling in limited lifetime memory devices including performing a method for monitoring a write data stream that includes write line addresses. A property of the write data stream is detected and a write leveling process is adapted in response to the detected property. The write leveling process is applied to the write data stream to generate physical addresses from the write line addresses.

...read moreread less

26 citations

Patent•

Computer memory with dynamic cell density

[...]

Michele M. Franceschini¹, John P. Karidis¹, Luis A. Lastras-Montano¹, Moinuddin K. Qureshi¹•Institutions (1)

IBM¹

09 Apr 2010

TL;DR: In this article, the first memory region includes first memory units operating at a first density, and the second memory unit operating at the second density after being reassigned to the first region.

...read moreread less

Abstract: A computer memory with dynamic cell density including a method that obtains a target size for a first memory region. The first memory region includes first memory units operating at a first density. The first memory units are includes in a memory in a memory system. The memory is operable at the first density and a second density. The method also includes: determining that a current size of the first memory region is not within a threshold of the target size and that the first memory region is smaller than the target size; identifying a second memory unit currently operating at the second density in a second memory region, the second memory unit included in the memory; and dynamically reassigning, during normal system operation, the second memory unit into the first memory region, the second memory unit operating at the first density after being reassigned to the first memory region.

...read moreread less

25 citations

Patent•

Measuring data switching activity in a microprocessor

[...]

Pradip Bose¹, Alper Buyuktosunoglu¹, Christopher Gonzalez¹, Moinuddin K. Qureshi¹, Victor Zyuban¹ - Show less +1 more•Institutions (1)

IBM¹

27 Jul 2010

TL;DR: In this article, a data switching activity identification mechanism is proposed for approximating data switching activities in a data processing system, which is based on the identification of a set of data storage devices and their associated bits.

...read moreread less

Abstract: A mechanism is provided for approximating data switching activity in a data processing system. A data switching activity identification mechanism in the data processing system receives an identification of a set of data storage devices and a set of bits in the set of data storage devices in the data processing system to be monitored for the data switching activity. The data switching activity identification mechanism sums a count of the identified bits that have changed state for the data storage device along with other counts of the identified bits that have changed state for other data storage devices in the set of data storage devices to form an approximation of data switching activity. A power manager in the data processing system then adjusts a set of operational parameters associated with the data processing system using the approximation of data switching activity.

...read moreread less

4 citations

Patent•

Memory access prediction

[...]

Moinuddin K. Qureshi¹•Institutions (1)

IBM¹

04 Feb 2010

TL;DR: In this paper, a cache snoop and access to physical memory are initiated in parallel for the data item if the indicator bit is a first predetermined bit (one (1) or zero (0)).

...read moreread less

Abstract: An apparatus for memory access prediction which includes a plurality of processors, a plurality of memory caches associated with the processors, a plurality of saturation counters associated with the processors, each of the saturation counters having an indicator bit, and a physical memory shared with the processors, saturation counters and memory caches. Upon a cache miss for a data item, a cache snoop and access to physical memory are initiated in parallel for the data item if the indicator bit is a first predetermined bit (one (1) or zero (0)) whereas a cache snoop is initiated if the most significant bit is a second predetermined bit (zero (0) or one (1)).

...read moreread less

4 citations