Showing papers on "Memory management published in 1993"

PDF

Open Access

Journal Article•DOI•

A methodology for implementing highly concurrent data objects

[...]

01 Nov 1993-ACM Transactions on Programming Languages and Systems

TL;DR: This paper proposes a new methodology for constructing lock-free and wait-free implementations of concurrent objects that are presented for a multiple instruction/multiple data (MIMD) architecture in which n processes communicate by applying atomic read, write, load_linked, and store_conditional operations to a shared memory.

...read moreread less

Abstract: A concurrent object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on critical sections; ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchronous systems: if one process is halted or delayed in a critical section, other, nonfaulty processes will be unable to progress. By contrast, a concurrent object implementation is lock free if it always guarantees that some process will complete an operation in a finite number of steps, and it is wait free if it guarantees that each process will complete an operation in a finite number of steps. This paper proposes a new methodology for constructing lock-free and wait-free implementations of concurrent objects. The object's representation and operations are written as stylized sequential programs, with no explicit synchronization. Each sequential operation is atutomatically transformed into a lock-free or wait-free operation using novel synchronization and memory management algorithms. These algorithms are presented for a multiple instruction/multiple data (MIMD) architecture in which n processes communicate by applying atomic read, write, load_linked, and store_conditional operations to a shared memory.

...read moreread less

553 citations

Journal Article•DOI•

Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers

[...]

Ten H. Tzen¹, Lionel M. Ni¹•Institutions (1)

Michigan State University¹

01 Jan 1993-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The experiments conducted in a 96-node Butterfly GP-1000 clearly show the advantage of the trapezoid self-scheduling over other well-known self- scheduling approaches.

...read moreread less

Abstract: A practical processor self-scheduling scheme, trapezoid self-scheduling, is proposed for arbitrary parallel nested loops in shared-memory multiprocessors. Generally, loops are the richest source of parallelism in parallel programs. To dynamically allocate loop iterations to processors, one may achieve load balancing among processors at the expense of run-time scheduling overhead. By linearly decreasing the chunk size at run time, the best tradeoff between the scheduling overhead and balanced workload can be obtained in the proposed trapezoid self-scheduling approach. Due to its simplicity and flexibility, this approach can be efficiently implemented in any parallel compiler. The small and predictable number of chores also allow efficient management of memory in a static fashion. The experiments conducted in a 96-node Butterfly GP-1000 clearly show the advantage of the trapezoid self-scheduling over other well-known self-scheduling approaches. >

...read moreread less

279 citations

Proceedings Article•DOI•

Communication optimization and code generation for distributed memory machines

[...]

Saman Amarasinghe, Monica S. Lam

01 Jun 1993

TL;DR: It is shown that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.

...read moreread less

Abstract: This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our algorithms produce an SPMD (single program multiple data) program to be run on each processor. Our compiler generated the necessary receive and send instructions, optimizes the communication by eliminating redundant communication and aggregating small messages into large messages, allocates space locally on each processor, and translates global data addresses to local addresses.Our techniques are based on an exact data-flow analysis on individual array element accesses. Unlike data dependence analysis, this analysis determines if two dynamic instances refer to the same value, and not just to the same location. Using this information, our compiler can handle more flexible data decompositions and find more opportunities for communication optimization than systems based on data dependence analysis.Our technique is based on a uniform framework, where data decompositions, computation decompositions and the data flow information are all represented as systems of linear inequalities. We show that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.

...read moreread less

241 citations

Patent•

Memory management and protection system for virtual memory in computer system

[...]

Hiroshi Nozue¹, Mitsuo Saito¹, Kenichi Maeda¹, Shigehiro Asano¹, Toshio Okamoto¹, Shin Sungho¹, Hideo Segawa¹ - Show less +3 more•Institutions (1)

Toshiba¹

23 Feb 1993

TL;DR: In this paper, a memory management and protection system for realizing a high speed execution and a proper and flexible memory access control for multiple programs sharing an identical logical address space is presented, where the memory access is permitted according to a segment identifier identifying a segment in the logical address spaces, and a memory protection information for a region in each segment including a target right permission to indicate assigned rights to make a memory access from the region to each of the segments, and an execution permission indicating a type of the access permitted by the right permission.

...read moreread less

Abstract: A memory management and protection system for realizing a high speed execution and a proper and flexible memory access control for multiple programs sharing an identical logical address space. In the system, the memory access is permitted according to a segment identifier identifying a segment in the logical address space, and a memory protection information for a region in each segment including a target right permission to indicate assigned rights to make a memory access from the region to each of the segments, and an execution permission to indicate a type of the memory access permitted by the right permission. Alternatively, a memory access can be permitted by using an access control list to be attached to each address table entry, which stores a plurality of program numbers identifying programs which are permitted to make accesses to the logical address stored in each address table entry, among which one that matches with the current program number is to be searched. Also, it is preferable to allocate a plurality of programs within a limit of available memory protection capacity to an identical logical address space, without any overlap between adjacently allocated address regions.

...read moreread less

195 citations

Patent•

Serial architecture for memory module control

[...]

Darrell L. Cox¹•Institutions (1)

Hewlett-Packard¹

12 Oct 1993

TL;DR: An expandable memory system as discussed by the authors includes a central memory controller and one or more plug-in memory modules, each memory module having an on-board memory module controller coupled in a serial network architecture which forms a memory command link.

...read moreread less

Abstract: An expandable memory system including a central memory controller and one or more plug-in memory modules, each memory module having an on-board memory module controller coupled in a serial network architecture which forms a memory command link Each memory module controller is serially linked to the central memory controller. The memory system is automatically configured by the central controller, each memory module in the system is assigned a base address, in turn, to define a contiguous memory space without user intervention or the requirement to physically reset switches. The memory system includes the capability to disable and bypass bad memory modules and reassign memory addresses without leaving useable memory unallocated.

...read moreread less

178 citations

Journal Article•DOI•

Sparcle: an evolutionary processor design for large-scale multiprocessors

[...]

Anant Agarwal¹, John Kubiatowicz¹, David M. Kranz¹, Beng-Hong Lim¹, Donald Yeung¹, G. D'Souza, M. Parkin - Show less +3 more•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1993-IEEE Micro

TL;DR: The design of the Sparcle chip, which incorporates mechanisms required for massively parallel systems in a Sparc RISC core, is described and its fine-grain computation, memory latency tolerance, and efficient message interface are discussed.

...read moreread less

Abstract: The design of the Sparcle chip, which incorporates mechanisms required for massively parallel systems in a Sparc RISC core, is described. Coupled with a communications and memory management chip (CMMU) Sparcle allows a fast, 14-cycle context switch, an 8-cycle user-level message send, and fine-grain full/empty-bit synchronization. Sparcle's fine-grain computation, memory latency tolerance, and efficient message interface are discussed. The implementation of Sparcle as a CPU for the Alewife machine is described. >

...read moreread less

170 citations

Proceedings Article•DOI•

Using lifetime predictors to improve memory allocation performance

[...]

David A. Barrett, Benjamin G. Zorn

01 Jun 1993

TL;DR: Research is described that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated and can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.

...read moreread less

Abstract: Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated. Using five significant, allocation-intensive C programs, we show that a great fraction of all bytes allocated are short-lived (> 90% in all cases). Furthermore, we describe an algorithm for liftetime prediction that accurately predicts the lifetimes of 42–99% of all objects allocated. We describe and simulate a storage allocator that takes adavantage of lifetime prediction of short-lived objects and show that it can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.

...read moreread less

169 citations

Patent•

Apparatus and method for providing data security in a computer system having a removable memory

[...]

Peter T. McLean¹, Allen Cuccio¹•Institutions (1)

Maxtor¹

27 Oct 1993

TL;DR: In this paper, a secure memory card is set in a secured mode to prevent unauthorized access to the data stored on the memory card, even when removed from the computer system and subsequently inserted back into that or another computer system.

...read moreread less

Abstract: A computer system having a memory card (301) for storing data that is capable of being removed and reinserted and also having the capability of safeguarding the data stored thereon. A password is stored on the memory card. The memory card is set in a secured mode to prevent unauthorized access to the data stored on the memory card. Once the memory card is set in secure mode, it remains in secure mode, even when removed from the computer system (302) and subsequently inserted back into that or another computer system. Access to the data is permitted when the memory card is set in secure mode only if a valid password is provided to the memory card.

...read moreread less

155 citations

Proceedings Article•DOI•

Architectural requirements of parallel scientific applications with explicit communication

[...]

Robert Cypher¹, A. Ho, S. Konstantinidou, Paul Messina•Institutions (1)

IBM¹

01 May 1993

TL;DR: The goal is to quantify the floating point, memory, I/O and communication requirements of highly parallel scientific applications that perform explicit communication and develop analytical models for the effects of changing the problem size and the degree of parallelism.

...read moreread less

Abstract: This paper studies the behavior of scientific applications running on distributed memory parallel computers. Our goal is to quantify the floating point, memory, I/O and communication requirements of highly parallel scientific applications that perform explicit communication. In addition to quantifying these requirements for fixed problem sizes and numbers of processors, we develop analytical models for the effects of changing the problem size and the degree of parallelism for several of the applications. We use the results to evaluate the trade-offs in the design of multicomputer architectures.

...read moreread less

141 citations

Patent•

Memory analysis system for dynamically displaying memory allocation and de-allocation events associated with an application program

[...]

Mark E. Arsenault

29 Jun 1993

TL;DR: In this paper, a memory analysis system analyzes memory "events", i.e., the allocation or deallocation of memory locations associated with the execution of an application program and produces a graphic display associating dynamically allocated memory segments with various program sub-routines.

...read moreread less

Abstract: A memory analysis system analyzes memory "events," i.e., the allocation or deallocation of memory locations, associated with the execution of an application program and produces a graphic display associating dynamically allocated memory segments with various program sub-routines. The system communicates with a debugger to trace the memory allocation routines back through the source code version of the application program and produce a call-stack, which lists the various source code sub-routines associated with the allocation of the segment and makes available the applicable lines of the source code. The system assigns to each of these locations a segment type, which relates to a program sub-routine that calls for it. The system includes a kernel processor that replaces calls to memory allocation and deallocation routines in the program with substitute routines that include instructions to notify the kernel processor each time a memory event occurs. The kernel processor monitors the response from the operating system and sends to a main processor included in the system a message that indicates the type event, identifies the memory locations involved and includes related information from the debugger tables which identifies the associated source code sub-routines. The main processor then controls the graphic display of the information. The kernel and the main processors communicate through a section of global memory that is set up as one or more circular queues. The kernel processor suspends the execution of the application program whenever the memory event queue is full, to maintain the application program and the display in relative synchronism.

...read moreread less

129 citations

Proceedings Article•DOI•

Improving the cache locality of memory allocation

[...]

Dirk Grunwald, Benjamin G. Zorn, Robert Henderson

01 Jun 1993

TL;DR: It is shown how the design of a memory allocator can significantly affect the reference locality for various applications, and measurements suggest an allocator design that is both very fast and has good locality of reference.

...read moreread less

Abstract: The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simualtion of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benefits of space efficiency. At the other extreme, algorithms can expend considerable effort to increase reference locality yet gain little in total execution performance. Our measurements suggest an allocator design that is both very fast and has good locality of reference.

...read moreread less

Proceedings Article•DOI•

Deterministic distribution sort in shared and distributed memory multiprocessors

[...]

Mark H. Nodine, Jeffrey Scott Vitter

01 Aug 1993

TL;DR: An elegant deterministic load balancing strategy for distribution sort that is applicable to a wide variety of parallel diska and parallel memory hierarchies with both single and parallel processors and shows how to sort determiniatically in parallelMemory hierarchies.

...read moreread less

Abstract: We present an elegant deterministic load balancing strategy for distribution sort that is applicable to a wide variety of parallel diska and parallel memory hierarchies with both single and parallel processors. The simplest application of the strategy is an optimal deterministic algorithm for external sorting with multiple disks and parallel processors. In each input/output (1/0) operation, each of the D ~ 1 disks can simultaneously transfer a block of B contiguous records. Our two measures of performance are the number of 1/0s and the amount of work done by the CPU(s); our algorithm ia simultaneously optimal for both measures. We also show how to sort determiniatically in parallel memory hierarchies. When the processors are interconnected by any sort of a PRAM, our algorithms are optimal for all parallel memory hierarchies; when the interconnection network is a hypercube, our algorithms are either optimal or best-known. ●Part of this research was done while the author was at Brown University, supported in part by an IBM Graduate Fellowship, by NSF research grants CCR-9007851 and IRI-91 16451, and by Army Research Office grant DAAL03-91-G–0035. Email: nOdine@mcrc.mOt .com. t P=t of this ~csem& WM done while the author WM at Brown University. Support was provided in part by Presidential Young Investigator Award CCR-9047466 with matching funds from IBM, by NSF research grant CCR-9007851, and by Army Research Office grant DAAL03-91-G-O035. Email: jsvtlcs.duke.edu. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. ACM-SPAA’93-6/93 /Velen,Germany. Q 1993 ACM 0-8979 j-599_ 2J93/0006/OJ 20...$1-50 Jeflreg Scott Vitterf Dept. of Computer Science Duke University, Box 90129 Durham, NC 27708-0129

...read moreread less

Proceedings Article•DOI•

The KSR 1: bridging the gap between shared memory and MPPs

[...]

S. Frank, H. Burkhardt, J. Rothnie

22 Feb 1993

TL;DR: The KSR 1 bridges the gap between the historical shared memory model and massively parallel processors (MPPs) by delivering the shared memory programming model and all of its benefits in a scalable, highly parallel architecture.

...read moreread less

Abstract: The KSR 1 bridges the gap between the historical shared memory model and massively parallel processors (MPPs) by delivering the shared memory programming model and all of its benefits in a scalable, highly parallel architecture. The KSR 1 runs a broad range of mainstream applications, ranging from numerically intensive computation to online transaction processing (OLTP) and database management and inquiry. The use of shared memory makes possible a standards-based open environment. The KSR 1's shared memory programming model is made possible by a new architectural technique called ALLCACHE memory. >

...read moreread less

Proceedings Article•DOI•

Design tradeoffs for software-managed TLBs

[...]

David F. Nagle¹, Richard Uhlig¹, Timothy J. Stanley¹, Stuart Sechrest¹, Trevor Mudge, Richard B. Brown¹ - Show less +2 more•Institutions (1)

University of Michigan¹

01 May 1993

TL;DR: This work explores software-managed TLB design tradeoffs and their interaction with a range of operating systems including monolithic and microkernel designs and explores TLB performance for benchmarks running on a MIPS R2000-based workstation.

...read moreread less

Abstract: An increasing number of architectures provide virtual memory support through software-managed TLBs. However, software management can impose considerable penalties, which are highly dependent on the operating system's structure and its use of virtual memory. This work explores software-managed TLB design tradeoffs and their interaction with a range of operating systems including monolithic and microkernel designs. Through hardware monitoring and simulations, we explore TLB performance for benchmarks running on a MIPS R2000-based workstation running Ultrix, OSF/1, and three versions of mach 3.0.Results: New operating systems are changing the relative frequency of different types of TLB misses, some of which may not be efficiently handled by current architectures. For the same application binaries, total TLB service time varies by as much as an order of magnitude under different operating systems. Reducing the handling cost for kernel TLB misses reduces total TLB service time up to 40%. For TLBs between 32 and 128 slots, each doubling of the TLB size reduces total TLB service time up to 50%.

...read moreread less

Proceedings Article•DOI•

Allocation of multiport memories for hierarchical data streams

[...]

P. E. R. Lippens¹, J. van Meerbergen¹, Wim Verhaegh¹, A. van der Werf¹•Institutions (1)

Philips¹

07 Nov 1993

TL;DR: A multiport memory allocation problem for hierarchical, i.e. multi-dimensional, data streams is described and a memory allocation algorithm is presented which only considers interconnect costs, but memory size and other cost factors can be taken into account.

...read moreread less

Abstract: A multiport memory allocation problem for hierarchical, i.e. multi-dimensional, data streams is described. Memory allocation techniques are used in high level synthesis for foreground and background memory allocation, the design of data format converters, and the design of synchronous inter-processor communication hardware. The techniques presented in this paper differ from other approaches in the sense that data streams are considered to be design entities and are not expanded to individual samples. A formal model for hierarchical data streams is given and a memory allocation algorithm is presented. The algorithm comprises two steps: data routing and assignment of signal delays to memories. A number of sub-problems are formulated as ILP programs. In the presented form, the allocation algorithm only considers interconnect costs, but memory size and other cost factors can be taken into account. The presented work is implemented in the memory allocation tool MEDEA which is part of the PHIDEO synthesis system.

...read moreread less

Patent•

Scatter-gather in data processing system

[...]

Clive Scott Oldfield¹, Nicholas Shaylor¹•Institutions (1)

IBM¹

20 Sep 1993

TL;DR: In this article, a technique for transferring data between system memory which is arranged in pages and an attached storage system is described, which is useful with storage systems that do not support scatter-gather and comprises determining for each data transfer the identity of any requested sector which lies completely within a physical page; and for those sectors which lie within a physically-aligned pages, transferring the sectors directly between secondary storage and memory by DMA.

...read moreread less

Abstract: Described is a technique which finds use in transferring data between system memory which is arranged in pages and an attached storage system. In such a paged memory, data which crosses pages having contiguous virtual addresses may map to data which crosses discontiguous physical pages. Scatter-gather is advantageously employed in such a system in order to achieve the transfer data directly between memory and storage usually by Direct Memory Access (DMA). A secondary storage device which supports scatter-gather usually includes hardware which will perform the necessary calculations to transfer the data to and from the correct locations in physical memory. The technique of the present invention is useful with storage systems that do not support scatter-gather and comprises determining for each data transfer the identity of any requested sector which lies completely within a physical page and the identity of any sector which crosses boundaries between discontiguous physical pages; and for those sectors which lie within a physical page, transferring the sectors directly between secondary storage and memory by DMA; and for those sectors which cross said boundaries, transferring each sector to either the memory or secondary storage via an intermediate buffer.

...read moreread less

Proceedings Article•DOI•

Protection traps and alternatives for memory management of an object-oriented language

[...]

Antony L. Hosking¹, J. Eliot B. Moss¹•Institutions (1)

University of Massachusetts Amherst¹

01 Dec 1993

TL;DR: The results show that for certain applications software solutions outperform solutions that rely on page-protection or other related virtual memory primitives, by comparing the performance of implementations that make use of the primitives with others that do not.

...read moreread less

Abstract: Many operating systems allow user programs to specify the protection level (inaccessible, read-only, read-write) of pages in their virtual memory address space, and to handle any protection violations that may occur. Such page-protection techniques have been exploited by several user-level algorithms for applications including generational garbage collection and persistent stores. Unfortunately, modern hardware has made efficient handling of page protection faults more difficult. Moreover, page-sized granularity may not match the natural granularity of a given application. In light of these problems, we reevaluate the usefulness of page-protection primitives in such applications, by comparing the performance of implementations that make use of the primitives with others that do not. Our results show that for certain applications software solutions outperform solutions that rely on page-protection or other related virtual memory primitives.

...read moreread less

Patent•

Safeguarded remote loading of service programs by authorizing loading in protected memory zones in a terminal

[...]

Christian Goire, Alain Sigaud, Eric Moyal

08 Apr 1993

TL;DR: In this article, the authors present a system including a terminal connected by a transmission line to a central processing unit, the terminal including a memory divided into a program memory and a working memory of the RAM type, the program memory in turn including a volatile memory, a safeguarded memory of EEPROM type or RAM type with batteries, and a resident memory of ROM or PROM type.

...read moreread less

Abstract: The present invention relates to a system including a terminal connected by a transmission line to a central processing unit, the terminal including a memory divided into a program memory and a working memory of the RAM type, the program memory in turn including a volatile memory, a safeguarded memory of the EEPROM type or RAM type with batteries, and a resident memory of the ROM or PROM type, characterized in that each of the memories comprising the program memory is divided into a noncertified zone, the terminal including an interpreter program for interpreting between a program written in a high-level universal compact language and the language specific to the microprocessor of the terminal, this interpreter program being capable of access to each of the memory divisions, and a remote loading monitoring program including at least one instruction CHSB, the command word of which is stored in one of the registers and expresses the remote loading possibilities of the various zones.

...read moreread less

Journal Article•DOI•

Mtool: an integrated system for performance debugging shared memory multiprocessor applications

[...]

A.J. Goldberg¹, John L. Hennessy²•Institutions (2)

Bell Labs¹, Stanford University²

01 Jan 1993-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Mtool augments a program with low overhead instrumentation which perturbs the program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks.

...read moreread less

Abstract: The authors describe Mtool, a software tool for analyzing performance losses in shared memory parallel programs. Mtool augments a program with low overhead instrumentation which perturbs the program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks. After running the instrumented version of the parallel program, the programmer can use Mtool's window-based user interface to view compute time, memory, and synchronization objects. The authors describe Mtool's low overhead instrumentation methods, memory bottleneck detection technique, and attention focusing mechanisms, contrast Mtool with other approaches, and offer a case study to demonstrate its effectiveness. >

...read moreread less

Proceedings Article•

Managing Memory to Meet Multiclass Workload Response Time Goals

[...]

Kurt P. Brown, Michael J. Carey, Miron Livny

24 Aug 1993

TL;DR: This paper proposes and evaluates an approach to DBMS memory management that addresses multiclass workloads with per-class response time goals by monitoring perclass database reference frequencies.

...read moreread less

Abstract: In this paper we propose and evaluate an approach to DBMS memory management that addresses multiclass workloads with per-class response time goals. It operates by monitoring perclass database reference frequencies as well as the state of the system relative to the goals of each class; the information that it gathers is used to help existing memory allocation and page replacement mechanisms avoid making decisions that may jcopardize performance goals.

...read moreread less

Proceedings Article•DOI•

Data locality and load balancing in COOL

[...]

Rohit Chandra, Anoop Gupta, John L. Hennessy

01 Jul 1993

TL;DR: This paper develops abstractions for the programmer to supply optional information about the data reference patterns of the program and demonstrates the effectiveness of these techniques by applying them to several applications chosen from the SPLASH parallel benchmark suite.

...read moreread less

Abstract: Large-scale shared memory multiprocessors typically support a multilevel memory hierarchy consisting of per-processor caches, a local portion of shared memory, and remote shared memory. On such machines, the performance of parallel programs is often limited by the high latency of remote memory references. In this paper we explore how knowledge of the underlying memory hierarchy can be used to schedule computation and distribute data structures, and thereby improve data locality. Our study is done in the context of COOL, a concurrent object-oriented language developed at Stanford. We develop abstractions for the programmer to supply optional information about the data reference patterns of the program. This information is used by the runtime system to distribute tasks and objects so that the tasks execute close (in the memory hierarchy) to the objects they reference.We demonstrate the effectiveness of these techniques by applying them to several applications chosen from the SPLASH parallel benchmark suite. Our experience suggests that improving data locality can be simple through a combination of programmer abstractions and smart runtime scheduling.

...read moreread less

Proceedings Article•DOI•

Simple, efficient shared memory simulations

[...]

Martin Dietzfelbinger, Friedhelm Meyer auf der Heide

01 Aug 1993

TL;DR: Three shared memory simulations on distributed memory machines (D MMs) are presented, which use universal hashing to dietribute the shared memory cells over the memory modules of the DMM, and a new combinatorial lemma is utilized, which may be of independent interest.

...read moreread less

Abstract: We present three shared memory simulations on distributed memory machines (D MMs), which use universal hashing to dietribute the shared memory cells over the memory modules of the DMM. We measure their quality in terms of delay, time-processor efficiency, memory contention (how many requests have to be satisfied by one memory module per simulated step) and simplicity. Further we take into consideration different rules for resolving access conflicts at the modules of the DMM, in particular the c-collision rule motivated by the idea of communicating between processors and modules using an optical crossbar. All simulations are very simple and deterministic (except for the random choice of the hash functions). In particular, we present the first “deterministic” time-processor optimal simulations with delay O(log n), both on Arbitrary DMMs and 2-collision DMMs. (These models are defined in the paper. ) The central idea for the latter simulation also yields a simple “deterministic” simulation of an n-processor PRAM on an n-processor 3-collision DMM with delay bounded by O(log log n) with high probability. For the time analysis of the simulations we utilize a new combinatorial lemma, which may be of independent interest. The lemma concerns events defined by properties of the color classes in random colorings of finite sets. Such events are not independent; the lemma shows that in an important special ● Supported in part by DFG-Forschergruppe “Effiziente Nutzung messiv paralleled Systeme, Tedprojekt 4“, by the Esprit Basic Research Action Nr. 7141 (ALCOM II), and by the Volkswagen Foundation Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for diract commercial advantage, the ACM copyright notice and the titla of tha publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. ACM-SPAA’93-6/93 /Velen,Germany. @ 1993 ACM O-89791-599-2/~3/0006/01 10...$1.50 case such events are “negatively correlated”, and thus, for the pupose of upper bounds on certain probabilities, may be treated as if independent.

...read moreread less

Patent•

Telephonic switching system with a user controlled data memory access system and method

[...]

Darryl P. Hymel¹•Institutions (1)

Rockwell International¹

29 Apr 1993

TL;DR: In this article, a telephonic switching system (11) with a switch controlled by a central processing unit (12) to interconnect interior communication units (14, 15, 16) in accordance with data in a shared data memory (18) with data that is alterable in response to signals generated by the communication units.

...read moreread less

Abstract: A telephonic switching system (11) with a switch (10) controlled by a central processing unit (12) to interconnect interior communication units (14) and exterior telephonic units (16) in accordance with data in a shared data memory (18) with data that is alterable in response to signals generated by the communication units. A data memory access system (23) periodically shifts access to the data memory (18) from one special process to another successive special process in a process memory (20). The data memory access system (23) prevents the normal periodic shifting of access to the data memory (18) during a period when the special process with access is enabled to access and alter shared data.

...read moreread less

Journal Article•DOI•

CustoMalloc: efficient synthesized memory allocators

[...]

Dirk Grunwald¹, Benjamin G. Zorn¹•Institutions (1)

University of Colorado Boulder¹

01 Aug 1993-Software - Practice and Experience

TL;DR: The allocation and disposal of memory is a ubiquitous operation in most programs, but in some applications, programmers use domain‐specific knowledge in an attempt to improve the speed or memory utilization of memory allocators.

...read moreread less

Abstract: The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. Yet, in some applications, programmers use domain-specific knowledge in an attempt to improve the speed or memory utilization of memory allocators. In this paper, we describe a program (CustoMalloc) that synthesizes a memory allocator customized for a specific application. Our experiments show that the synthesized allocators are uniformly faster and more space efficient than the Berkeley UNIX allocator. Constructing a custom allocator requires little programmer effort, usually taking only a few minutes. Experience has shown that the synthesized allocators are not overly sensitive to properties of input sets and the resulting allocators are superior even to domain-specific allocators designed by programmers. Measurements show that synthesized allocators are from two to ten times faster than widely-used allocators.

...read moreread less

Journal Article•DOI•

Memory Management Issues in Sparse Multifrontal Methods On Multiprocessors

[...]

Patrick R. Amestoy, Lain S. Duff

01 Mar 1993

TL;DR: A flexible memory man agement scheme which adapts well to a variation in the size of the working area and/or the number of pro cessors is designed, which is useful when designing an efficient memory management scheme for a wider range of parallel applications.

...read moreread less

Abstract: This article addresses the problems of memory man agement in a parallel sparse matrix factorization based on a multifrontal approach. We describe how we have adapted and modified the ideas of Duff and Reid used in a sequential symmetric multifrontal method to de sign an efficient memory management scheme for parallel sparse matrix factorization. With our solution, using the minimum size of the working area to run the multifrontal method on a multiprocessor, we can ex ploit only a part of the parallelism of the method. If we slightly increase the size of the working space, then most of the potential parallelism of the method can be exploited. We have designed a flexible memory man agement scheme which adapts well to a variation in the size of the working area and/or the number of pro cessors. General parallel applications can always be represented in terms of a computational graph, which is effectively the underlying structure of a parallel mul tifrontal method. Therefore, we believe that the tech niques presented here are useful when designing an efficient memory management scheme for a wider range of parallel applications.

...read moreread less

Proceedings Article•

Dynamic Memory Allocation for Multiple-Query Workloads

[...]

Manish Mehta¹, David J. DeWitt¹•Institutions (1)

University of Wisconsin-Madison¹

24 Aug 1993

TL;DR: A dynamic adaptive scheme which integrates schedtiling and memtiry allocation is developed and is shown to perform effectively under widely varying workloads.

...read moreread less

Abstract: This paper studies the problem ol memory allocation and scheduling in a multiple query workload with widely varying resource requirements. Several memory allocation and scheduling schemes are presented and their performance is compared using a detailed simulation study. The results demonstrate the inadequacies of static schemes with fixed scheduling and memory allocation policies. A dynamic adaptive scheme which integrates schedtiling and memtiry allocation is developed and is shown to perform effectively under widely varying workloads.

...read moreread less

Journal Article•DOI•

Processor- and memory-based checkpoint and rollback recovery

[...]

N.S. Bowen¹, D.K. Pradham•Institutions (1)

IBM¹

01 Feb 1993-IEEE Computer

TL;DR: A taxonomy for processor and memory techniques based on the memory hierarchy is presented, which provides a basis for understanding subtle differences among the various schemes.

...read moreread less

Abstract: Several hardware-based techniques that support checkpoint and rollback recovery are presented The focus is on hardware schemes for uniprocessors, shared-memory multiprocessors, and distributed virtual-memory systems A taxonomy for processor and memory techniques based on the memory hierarchy is presented This provides a basis for understanding subtle differences among the various schemes Processor-based schemes that handle transient faults by using processor-based transparent rollback techniques and memory-based schemes that roll back data instead of instructions and can be integrated with the processor techniques or can be exploited by higher levels of software are discussed >

...read moreread less

Proceedings Article•DOI•

Tools for the development of application-specific virtual memory management

[...]

Keith Krueger¹, David Loftesness¹, Amin Vahdat¹, Thomas Anderson¹•Institutions (1)

University of California, Berkeley¹

01 Oct 1993

TL;DR: In this paper, an extensible user-level virtual memory system based on a metaobject protocol with an innovative graphical performance monitor is presented to make the task of implementing a new application-specific page replacement policy considerably simpler.

...read moreread less

Abstract: The operating system''s virtual memory management policy is increasingly important to application performance because gains in processing speed are far outstripping improvements in disk latency. Indeed, large applications can gain large performance benefits from using a virtual memory policy tuned to their specific memory access patterns rather than a general policy provided by the operating system. As a result, a number of schemes have been proposed to allow for application-specific extensions to virtual memory management. These schemes have the potential to improve performance; however, to realize this performance gain, application developers must implement their own virtual memory module, a non-trivial programming task. Current operating systems and programming tools are inadequate for developing application-specific policies. Our work combines (I) an extensible user-level virtual memory system based on a metaobject protocol with (ii) an innovative graphical performance monitor to make the task of implementing a new application-specific page replacement policy considerably simpler. The techniques presented for opening up operating system virtual memory policy to user control are general; they could be used to build application-specific implementations of other operating system policies.

...read moreread less

Patent•

A bus interface with graphics and system paths for an integrated memory system

[...]

Judson A. Lehman¹, Shih-Ho Wu¹•Institutions (1)

VLSI Technology¹

30 Nov 1993

TL;DR: In this paper, a low-cost computer system which includes a single shared memory that can be independently accessible as graphics memory or main store system memory without performance degradation is presented, which alleviates any need to oversize the display memory, yet realizes the cost effectiveness of using readily available memory sizes.

...read moreread less

Abstract: The present invention provides a low-cost computer system which includes a single shared memory that can be independently accessible as graphics memory or main store system memory without performance degradation. Because the 'appetite' for main system memory (unlike that of a display memory) is difficult to satisfy, the memory granularity problem can be addressed by programmably reallocating an unused portion of a display memory for system memory use. Reallocation of the unused display memory alleviates any need to oversize the display memory, yet realizes the cost effectiveness of using readily available memory sizes. Further, reallocation of the graphics memory avoids any need to separately consider both the system memory and the display memory in accommodating worst case operational requirements.

...read moreread less

Proceedings Article•DOI•

Memory servers for multicomputers

[...]

Liviu Iftode¹, Kai Li¹, Karin Petersen¹•Institutions (1)

Princeton University¹

01 Jan 1993

TL;DR: The authors present the model, describe how the model supports sequential programs, message-passing programs, and shared virtual memory systems, and discuss several design issues, and show preliminary results of a prototype implementation on an Intel iPSC/860.

...read moreread less

Abstract: A virtual memory management technique for multicomputers called memory servers is investigated. The memory server model extends the memory hierarchy of multicomputers by introducing a remote memory server layer. Memory servers are multicomputer nodes whose memory is used for fast backing storage and logically lie between the local physical memory and disks. The authors present the model, describe how the model supports sequential programs, message-passing programs, and shared virtual memory systems, discuss several design issues, and show preliminary results of a prototype implementation on an Intel iPSC/860. >

...read moreread less

Collapse