scispace - formally typeset
Search or ask a question

Showing papers on "Memory management published in 1993"


Journal ArticleDOI
TL;DR: This paper proposes a new methodology for constructing lock-free and wait-free implementations of concurrent objects that are presented for a multiple instruction/multiple data (MIMD) architecture in which n processes communicate by applying atomic read, write, load_linked, and store_conditional operations to a shared memory.
Abstract: A concurrent object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on critical sections; ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchronous systems: if one process is halted or delayed in a critical section, other, nonfaulty processes will be unable to progress. By contrast, a concurrent object implementation is lock free if it always guarantees that some process will complete an operation in a finite number of steps, and it is wait free if it guarantees that each process will complete an operation in a finite number of steps. This paper proposes a new methodology for constructing lock-free and wait-free implementations of concurrent objects. The object's representation and operations are written as stylized sequential programs, with no explicit synchronization. Each sequential operation is atutomatically transformed into a lock-free or wait-free operation using novel synchronization and memory management algorithms. These algorithms are presented for a multiple instruction/multiple data (MIMD) architecture in which n processes communicate by applying atomic read, write, load_linked, and store_conditional operations to a shared memory.

553 citations


Journal ArticleDOI
TL;DR: The experiments conducted in a 96-node Butterfly GP-1000 clearly show the advantage of the trapezoid self-scheduling over other well-known self- scheduling approaches.
Abstract: A practical processor self-scheduling scheme, trapezoid self-scheduling, is proposed for arbitrary parallel nested loops in shared-memory multiprocessors. Generally, loops are the richest source of parallelism in parallel programs. To dynamically allocate loop iterations to processors, one may achieve load balancing among processors at the expense of run-time scheduling overhead. By linearly decreasing the chunk size at run time, the best tradeoff between the scheduling overhead and balanced workload can be obtained in the proposed trapezoid self-scheduling approach. Due to its simplicity and flexibility, this approach can be efficiently implemented in any parallel compiler. The small and predictable number of chores also allow efficient management of memory in a static fashion. The experiments conducted in a 96-node Butterfly GP-1000 clearly show the advantage of the trapezoid self-scheduling over other well-known self-scheduling approaches. >

279 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: It is shown that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.
Abstract: This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our algorithms produce an SPMD (single program multiple data) program to be run on each processor. Our compiler generated the necessary receive and send instructions, optimizes the communication by eliminating redundant communication and aggregating small messages into large messages, allocates space locally on each processor, and translates global data addresses to local addresses.Our techniques are based on an exact data-flow analysis on individual array element accesses. Unlike data dependence analysis, this analysis determines if two dynamic instances refer to the same value, and not just to the same location. Using this information, our compiler can handle more flexible data decompositions and find more opportunities for communication optimization than systems based on data dependence analysis.Our technique is based on a uniform framework, where data decompositions, computation decompositions and the data flow information are all represented as systems of linear inequalities. We show that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.

241 citations


Patent
23 Feb 1993
TL;DR: In this paper, a memory management and protection system for realizing a high speed execution and a proper and flexible memory access control for multiple programs sharing an identical logical address space is presented, where the memory access is permitted according to a segment identifier identifying a segment in the logical address spaces, and a memory protection information for a region in each segment including a target right permission to indicate assigned rights to make a memory access from the region to each of the segments, and an execution permission indicating a type of the access permitted by the right permission.
Abstract: A memory management and protection system for realizing a high speed execution and a proper and flexible memory access control for multiple programs sharing an identical logical address space. In the system, the memory access is permitted according to a segment identifier identifying a segment in the logical address space, and a memory protection information for a region in each segment including a target right permission to indicate assigned rights to make a memory access from the region to each of the segments, and an execution permission to indicate a type of the memory access permitted by the right permission. Alternatively, a memory access can be permitted by using an access control list to be attached to each address table entry, which stores a plurality of program numbers identifying programs which are permitted to make accesses to the logical address stored in each address table entry, among which one that matches with the current program number is to be searched. Also, it is preferable to allocate a plurality of programs within a limit of available memory protection capacity to an identical logical address space, without any overlap between adjacently allocated address regions.

195 citations


Patent
Darrell L. Cox1
12 Oct 1993
TL;DR: An expandable memory system as discussed by the authors includes a central memory controller and one or more plug-in memory modules, each memory module having an on-board memory module controller coupled in a serial network architecture which forms a memory command link.
Abstract: An expandable memory system including a central memory controller and one or more plug-in memory modules, each memory module having an on-board memory module controller coupled in a serial network architecture which forms a memory command link Each memory module controller is serially linked to the central memory controller. The memory system is automatically configured by the central controller, each memory module in the system is assigned a base address, in turn, to define a contiguous memory space without user intervention or the requirement to physically reset switches. The memory system includes the capability to disable and bypass bad memory modules and reassign memory addresses without leaving useable memory unallocated.

178 citations


Journal ArticleDOI
TL;DR: The design of the Sparcle chip, which incorporates mechanisms required for massively parallel systems in a Sparc RISC core, is described and its fine-grain computation, memory latency tolerance, and efficient message interface are discussed.
Abstract: The design of the Sparcle chip, which incorporates mechanisms required for massively parallel systems in a Sparc RISC core, is described. Coupled with a communications and memory management chip (CMMU) Sparcle allows a fast, 14-cycle context switch, an 8-cycle user-level message send, and fine-grain full/empty-bit synchronization. Sparcle's fine-grain computation, memory latency tolerance, and efficient message interface are discussed. The implementation of Sparcle as a CPU for the Alewife machine is described. >

170 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: Research is described that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated and can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.
Abstract: Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated. Using five significant, allocation-intensive C programs, we show that a great fraction of all bytes allocated are short-lived (> 90% in all cases). Furthermore, we describe an algorithm for liftetime prediction that accurately predicts the lifetimes of 42–99% of all objects allocated. We describe and simulate a storage allocator that takes adavantage of lifetime prediction of short-lived objects and show that it can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.

169 citations


Patent
Peter T. McLean1, Allen Cuccio1
27 Oct 1993
TL;DR: In this paper, a secure memory card is set in a secured mode to prevent unauthorized access to the data stored on the memory card, even when removed from the computer system and subsequently inserted back into that or another computer system.
Abstract: A computer system having a memory card (301) for storing data that is capable of being removed and reinserted and also having the capability of safeguarding the data stored thereon. A password is stored on the memory card. The memory card is set in a secured mode to prevent unauthorized access to the data stored on the memory card. Once the memory card is set in secure mode, it remains in secure mode, even when removed from the computer system (302) and subsequently inserted back into that or another computer system. Access to the data is permitted when the memory card is set in secure mode only if a valid password is provided to the memory card.

155 citations


Proceedings ArticleDOI
01 May 1993
TL;DR: The goal is to quantify the floating point, memory, I/O and communication requirements of highly parallel scientific applications that perform explicit communication and develop analytical models for the effects of changing the problem size and the degree of parallelism.
Abstract: This paper studies the behavior of scientific applications running on distributed memory parallel computers. Our goal is to quantify the floating point, memory, I/O and communication requirements of highly parallel scientific applications that perform explicit communication. In addition to quantifying these requirements for fixed problem sizes and numbers of processors, we develop analytical models for the effects of changing the problem size and the degree of parallelism for several of the applications. We use the results to evaluate the trade-offs in the design of multicomputer architectures.

141 citations


Patent
29 Jun 1993
TL;DR: In this paper, a memory analysis system analyzes memory "events", i.e., the allocation or deallocation of memory locations associated with the execution of an application program and produces a graphic display associating dynamically allocated memory segments with various program sub-routines.
Abstract: A memory analysis system analyzes memory "events," i.e., the allocation or deallocation of memory locations, associated with the execution of an application program and produces a graphic display associating dynamically allocated memory segments with various program sub-routines. The system communicates with a debugger to trace the memory allocation routines back through the source code version of the application program and produce a call-stack, which lists the various source code sub-routines associated with the allocation of the segment and makes available the applicable lines of the source code. The system assigns to each of these locations a segment type, which relates to a program sub-routine that calls for it. The system includes a kernel processor that replaces calls to memory allocation and deallocation routines in the program with substitute routines that include instructions to notify the kernel processor each time a memory event occurs. The kernel processor monitors the response from the operating system and sends to a main processor included in the system a message that indicates the type event, identifies the memory locations involved and includes related information from the debugger tables which identifies the associated source code sub-routines. The main processor then controls the graphic display of the information. The kernel and the main processors communicate through a section of global memory that is set up as one or more circular queues. The kernel processor suspends the execution of the application program whenever the memory event queue is full, to maintain the application program and the display in relative synchronism.

129 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: It is shown how the design of a memory allocator can significantly affect the reference locality for various applications, and measurements suggest an allocator design that is both very fast and has good locality of reference.
Abstract: The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simualtion of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benefits of space efficiency. At the other extreme, algorithms can expend considerable effort to increase reference locality yet gain little in total execution performance. Our measurements suggest an allocator design that is both very fast and has good locality of reference.

Proceedings ArticleDOI
01 Aug 1993
TL;DR: An elegant deterministic load balancing strategy for distribution sort that is applicable to a wide variety of parallel diska and parallel memory hierarchies with both single and parallel processors and shows how to sort determiniatically in parallelMemory hierarchies.
Abstract: We present an elegant deterministic load balancing strategy for distribution sort that is applicable to a wide variety of parallel diska and parallel memory hierarchies with both single and parallel processors. The simplest application of the strategy is an optimal deterministic algorithm for external sorting with multiple disks and parallel processors. In each input/output (1/0) operation, each of the D ~ 1 disks can simultaneously transfer a block of B contiguous records. Our two measures of performance are the number of 1/0s and the amount of work done by the CPU(s); our algorithm ia simultaneously optimal for both measures. We also show how to sort determiniatically in parallel memory hierarchies. When the processors are interconnected by any sort of a PRAM, our algorithms are optimal for all parallel memory hierarchies; when the interconnection network is a hypercube, our algorithms are either optimal or best-known. ●Part of this research was done while the author was at Brown University, supported in part by an IBM Graduate Fellowship, by NSF research grants CCR-9007851 and IRI-91 16451, and by Army Research Office grant DAAL03-91-G–0035. Email: nOdine@mcrc.mOt .com. t P=t of this ~csem& WM done while the author WM at Brown University. Support was provided in part by Presidential Young Investigator Award CCR-9047466 with matching funds from IBM, by NSF research grant CCR-9007851, and by Army Research Office grant DAAL03-91-G-O035. Email: jsvtlcs.duke.edu. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. ACM-SPAA’93-6/93 /Velen,Germany. Q 1993 ACM 0-8979 j-599_ 2J93/0006/OJ 20...$1-50 Jeflreg Scott Vitterf Dept. of Computer Science Duke University, Box 90129 Durham, NC 27708-0129

Proceedings ArticleDOI
22 Feb 1993
TL;DR: The KSR 1 bridges the gap between the historical shared memory model and massively parallel processors (MPPs) by delivering the shared memory programming model and all of its benefits in a scalable, highly parallel architecture.
Abstract: The KSR 1 bridges the gap between the historical shared memory model and massively parallel processors (MPPs) by delivering the shared memory programming model and all of its benefits in a scalable, highly parallel architecture. The KSR 1 runs a broad range of mainstream applications, ranging from numerically intensive computation to online transaction processing (OLTP) and database management and inquiry. The use of shared memory makes possible a standards-based open environment. The KSR 1's shared memory programming model is made possible by a new architectural technique called ALLCACHE memory. >

Proceedings ArticleDOI
01 May 1993
TL;DR: This work explores software-managed TLB design tradeoffs and their interaction with a range of operating systems including monolithic and microkernel designs and explores TLB performance for benchmarks running on a MIPS R2000-based workstation.
Abstract: An increasing number of architectures provide virtual memory support through software-managed TLBs. However, software management can impose considerable penalties, which are highly dependent on the operating system's structure and its use of virtual memory. This work explores software-managed TLB design tradeoffs and their interaction with a range of operating systems including monolithic and microkernel designs. Through hardware monitoring and simulations, we explore TLB performance for benchmarks running on a MIPS R2000-based workstation running Ultrix, OSF/1, and three versions of mach 3.0.Results: New operating systems are changing the relative frequency of different types of TLB misses, some of which may not be efficiently handled by current architectures. For the same application binaries, total TLB service time varies by as much as an order of magnitude under different operating systems. Reducing the handling cost for kernel TLB misses reduces total TLB service time up to 40%. For TLBs between 32 and 128 slots, each doubling of the TLB size reduces total TLB service time up to 50%.

Proceedings ArticleDOI
07 Nov 1993
TL;DR: A multiport memory allocation problem for hierarchical, i.e. multi-dimensional, data streams is described and a memory allocation algorithm is presented which only considers interconnect costs, but memory size and other cost factors can be taken into account.
Abstract: A multiport memory allocation problem for hierarchical, i.e. multi-dimensional, data streams is described. Memory allocation techniques are used in high level synthesis for foreground and background memory allocation, the design of data format converters, and the design of synchronous inter-processor communication hardware. The techniques presented in this paper differ from other approaches in the sense that data streams are considered to be design entities and are not expanded to individual samples. A formal model for hierarchical data streams is given and a memory allocation algorithm is presented. The algorithm comprises two steps: data routing and assignment of signal delays to memories. A number of sub-problems are formulated as ILP programs. In the presented form, the allocation algorithm only considers interconnect costs, but memory size and other cost factors can be taken into account. The presented work is implemented in the memory allocation tool MEDEA which is part of the PHIDEO synthesis system.

Patent
20 Sep 1993
TL;DR: In this article, a technique for transferring data between system memory which is arranged in pages and an attached storage system is described, which is useful with storage systems that do not support scatter-gather and comprises determining for each data transfer the identity of any requested sector which lies completely within a physical page; and for those sectors which lie within a physically-aligned pages, transferring the sectors directly between secondary storage and memory by DMA.
Abstract: Described is a technique which finds use in transferring data between system memory which is arranged in pages and an attached storage system. In such a paged memory, data which crosses pages having contiguous virtual addresses may map to data which crosses discontiguous physical pages. Scatter-gather is advantageously employed in such a system in order to achieve the transfer data directly between memory and storage usually by Direct Memory Access (DMA). A secondary storage device which supports scatter-gather usually includes hardware which will perform the necessary calculations to transfer the data to and from the correct locations in physical memory. The technique of the present invention is useful with storage systems that do not support scatter-gather and comprises determining for each data transfer the identity of any requested sector which lies completely within a physical page and the identity of any sector which crosses boundaries between discontiguous physical pages; and for those sectors which lie within a physical page, transferring the sectors directly between secondary storage and memory by DMA; and for those sectors which cross said boundaries, transferring each sector to either the memory or secondary storage via an intermediate buffer.

Proceedings ArticleDOI
01 Dec 1993
TL;DR: The results show that for certain applications software solutions outperform solutions that rely on page-protection or other related virtual memory primitives, by comparing the performance of implementations that make use of the primitives with others that do not.
Abstract: Many operating systems allow user programs to specify the protection level (inaccessible, read-only, read-write) of pages in their virtual memory address space, and to handle any protection violations that may occur. Such page-protection techniques have been exploited by several user-level algorithms for applications including generational garbage collection and persistent stores. Unfortunately, modern hardware has made efficient handling of page protection faults more difficult. Moreover, page-sized granularity may not match the natural granularity of a given application. In light of these problems, we reevaluate the usefulness of page-protection primitives in such applications, by comparing the performance of implementations that make use of the primitives with others that do not. Our results show that for certain applications software solutions outperform solutions that rely on page-protection or other related virtual memory primitives.

Patent
08 Apr 1993
TL;DR: In this article, the authors present a system including a terminal connected by a transmission line to a central processing unit, the terminal including a memory divided into a program memory and a working memory of the RAM type, the program memory in turn including a volatile memory, a safeguarded memory of EEPROM type or RAM type with batteries, and a resident memory of ROM or PROM type.
Abstract: The present invention relates to a system including a terminal connected by a transmission line to a central processing unit, the terminal including a memory divided into a program memory and a working memory of the RAM type, the program memory in turn including a volatile memory, a safeguarded memory of the EEPROM type or RAM type with batteries, and a resident memory of the ROM or PROM type, characterized in that each of the memories comprising the program memory is divided into a noncertified zone, the terminal including an interpreter program for interpreting between a program written in a high-level universal compact language and the language specific to the microprocessor of the terminal, this interpreter program being capable of access to each of the memory divisions, and a remote loading monitoring program including at least one instruction CHSB, the command word of which is stored in one of the registers and expresses the remote loading possibilities of the various zones.

Journal ArticleDOI
TL;DR: Mtool augments a program with low overhead instrumentation which perturbs the program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks.
Abstract: The authors describe Mtool, a software tool for analyzing performance losses in shared memory parallel programs. Mtool augments a program with low overhead instrumentation which perturbs the program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks. After running the instrumented version of the parallel program, the programmer can use Mtool's window-based user interface to view compute time, memory, and synchronization objects. The authors describe Mtool's low overhead instrumentation methods, memory bottleneck detection technique, and attention focusing mechanisms, contrast Mtool with other approaches, and offer a case study to demonstrate its effectiveness. >

Proceedings Article
24 Aug 1993
TL;DR: This paper proposes and evaluates an approach to DBMS memory management that addresses multiclass workloads with per-class response time goals by monitoring perclass database reference frequencies.
Abstract: In this paper we propose and evaluate an approach to DBMS memory management that addresses multiclass workloads with per-class response time goals. It operates by monitoring perclass database reference frequencies as well as the state of the system relative to the goals of each class; the information that it gathers is used to help existing memory allocation and page replacement mechanisms avoid making decisions that may jcopardize performance goals.

Proceedings ArticleDOI
01 Jul 1993
TL;DR: This paper develops abstractions for the programmer to supply optional information about the data reference patterns of the program and demonstrates the effectiveness of these techniques by applying them to several applications chosen from the SPLASH parallel benchmark suite.
Abstract: Large-scale shared memory multiprocessors typically support a multilevel memory hierarchy consisting of per-processor caches, a local portion of shared memory, and remote shared memory. On such machines, the performance of parallel programs is often limited by the high latency of remote memory references. In this paper we explore how knowledge of the underlying memory hierarchy can be used to schedule computation and distribute data structures, and thereby improve data locality. Our study is done in the context of COOL, a concurrent object-oriented language developed at Stanford. We develop abstractions for the programmer to supply optional information about the data reference patterns of the program. This information is used by the runtime system to distribute tasks and objects so that the tasks execute close (in the memory hierarchy) to the objects they reference.We demonstrate the effectiveness of these techniques by applying them to several applications chosen from the SPLASH parallel benchmark suite. Our experience suggests that improving data locality can be simple through a combination of programmer abstractions and smart runtime scheduling.

Proceedings ArticleDOI
01 Aug 1993
TL;DR: Three shared memory simulations on distributed memory machines (D MMs) are presented, which use universal hashing to dietribute the shared memory cells over the memory modules of the DMM, and a new combinatorial lemma is utilized, which may be of independent interest.
Abstract: We present three shared memory simulations on distributed memory machines (D MMs), which use universal hashing to dietribute the shared memory cells over the memory modules of the DMM. We measure their quality in terms of delay, time-processor efficiency, memory contention (how many requests have to be satisfied by one memory module per simulated step) and simplicity. Further we take into consideration different rules for resolving access conflicts at the modules of the DMM, in particular the c-collision rule motivated by the idea of communicating between processors and modules using an optical crossbar. All simulations are very simple and deterministic (except for the random choice of the hash functions). In particular, we present the first “deterministic” time-processor optimal simulations with delay O(log n), both on Arbitrary DMMs and 2-collision DMMs. (These models are defined in the paper. ) The central idea for the latter simulation also yields a simple “deterministic” simulation of an n-processor PRAM on an n-processor 3-collision DMM with delay bounded by O(log log n) with high probability. For the time analysis of the simulations we utilize a new combinatorial lemma, which may be of independent interest. The lemma concerns events defined by properties of the color classes in random colorings of finite sets. Such events are not independent; the lemma shows that in an important special ● Supported in part by DFG-Forschergruppe “Effiziente Nutzung messiv paralleled Systeme, Tedprojekt 4“, by the Esprit Basic Research Action Nr. 7141 (ALCOM II), and by the Volkswagen Foundation Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for diract commercial advantage, the ACM copyright notice and the titla of tha publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. ACM-SPAA’93-6/93 /Velen,Germany. @ 1993 ACM O-89791-599-2/~3/0006/01 10...$1.50 case such events are “negatively correlated”, and thus, for the pupose of upper bounds on certain probabilities, may be treated as if independent.

Patent
29 Apr 1993
TL;DR: In this article, a telephonic switching system (11) with a switch controlled by a central processing unit (12) to interconnect interior communication units (14, 15, 16) in accordance with data in a shared data memory (18) with data that is alterable in response to signals generated by the communication units.
Abstract: A telephonic switching system (11) with a switch (10) controlled by a central processing unit (12) to interconnect interior communication units (14) and exterior telephonic units (16) in accordance with data in a shared data memory (18) with data that is alterable in response to signals generated by the communication units. A data memory access system (23) periodically shifts access to the data memory (18) from one special process to another successive special process in a process memory (20). The data memory access system (23) prevents the normal periodic shifting of access to the data memory (18) during a period when the special process with access is enabled to access and alter shared data.

Journal ArticleDOI
TL;DR: The allocation and disposal of memory is a ubiquitous operation in most programs, but in some applications, programmers use domain‐specific knowledge in an attempt to improve the speed or memory utilization of memory allocators.
Abstract: The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. Yet, in some applications, programmers use domain-specific knowledge in an attempt to improve the speed or memory utilization of memory allocators. In this paper, we describe a program (CustoMalloc) that synthesizes a memory allocator customized for a specific application. Our experiments show that the synthesized allocators are uniformly faster and more space efficient than the Berkeley UNIX allocator. Constructing a custom allocator requires little programmer effort, usually taking only a few minutes. Experience has shown that the synthesized allocators are not overly sensitive to properties of input sets and the resulting allocators are superior even to domain-specific allocators designed by programmers. Measurements show that synthesized allocators are from two to ten times faster than widely-used allocators.

Journal ArticleDOI
01 Mar 1993
TL;DR: A flexible memory man agement scheme which adapts well to a variation in the size of the working area and/or the number of pro cessors is designed, which is useful when designing an efficient memory management scheme for a wider range of parallel applications.
Abstract: This article addresses the problems of memory man agement in a parallel sparse matrix factorization based on a multifrontal approach. We describe how we have adapted and modified the ideas of Duff and Reid used in a sequential symmetric multifrontal method to de sign an efficient memory management scheme for parallel sparse matrix factorization. With our solution, using the minimum size of the working area to run the multifrontal method on a multiprocessor, we can ex ploit only a part of the parallelism of the method. If we slightly increase the size of the working space, then most of the potential parallelism of the method can be exploited. We have designed a flexible memory man agement scheme which adapts well to a variation in the size of the working area and/or the number of pro cessors. General parallel applications can always be represented in terms of a computational graph, which is effectively the underlying structure of a parallel mul tifrontal method. Therefore, we believe that the tech niques presented here are useful when designing an efficient memory management scheme for a wider range of parallel applications.

Proceedings Article
24 Aug 1993
TL;DR: A dynamic adaptive scheme which integrates schedtiling and memtiry allocation is developed and is shown to perform effectively under widely varying workloads.
Abstract: This paper studies the problem ol memory allocation and scheduling in a multiple query workload with widely varying resource requirements. Several memory allocation and scheduling schemes are presented and their performance is compared using a detailed simulation study. The results demonstrate the inadequacies of static schemes with fixed scheduling and memory allocation policies. A dynamic adaptive scheme which integrates schedtiling and memtiry allocation is developed and is shown to perform effectively under widely varying workloads.

Journal ArticleDOI
N.S. Bowen1, D.K. Pradham
TL;DR: A taxonomy for processor and memory techniques based on the memory hierarchy is presented, which provides a basis for understanding subtle differences among the various schemes.
Abstract: Several hardware-based techniques that support checkpoint and rollback recovery are presented The focus is on hardware schemes for uniprocessors, shared-memory multiprocessors, and distributed virtual-memory systems A taxonomy for processor and memory techniques based on the memory hierarchy is presented This provides a basis for understanding subtle differences among the various schemes Processor-based schemes that handle transient faults by using processor-based transparent rollback techniques and memory-based schemes that roll back data instead of instructions and can be integrated with the processor techniques or can be exploited by higher levels of software are discussed >

Proceedings ArticleDOI
01 Oct 1993
TL;DR: In this paper, an extensible user-level virtual memory system based on a metaobject protocol with an innovative graphical performance monitor is presented to make the task of implementing a new application-specific page replacement policy considerably simpler.
Abstract: The operating system''s virtual memory management policy is increasingly important to application performance because gains in processing speed are far outstripping improvements in disk latency. Indeed, large applications can gain large performance benefits from using a virtual memory policy tuned to their specific memory access patterns rather than a general policy provided by the operating system. As a result, a number of schemes have been proposed to allow for application-specific extensions to virtual memory management. These schemes have the potential to improve performance; however, to realize this performance gain, application developers must implement their own virtual memory module, a non-trivial programming task. Current operating systems and programming tools are inadequate for developing application-specific policies. Our work combines (I) an extensible user-level virtual memory system based on a metaobject protocol with (ii) an innovative graphical performance monitor to make the task of implementing a new application-specific page replacement policy considerably simpler. The techniques presented for opening up operating system virtual memory policy to user control are general; they could be used to build application-specific implementations of other operating system policies.

Patent
30 Nov 1993
TL;DR: In this paper, a low-cost computer system which includes a single shared memory that can be independently accessible as graphics memory or main store system memory without performance degradation is presented, which alleviates any need to oversize the display memory, yet realizes the cost effectiveness of using readily available memory sizes.
Abstract: The present invention provides a low-cost computer system which includes a single shared memory that can be independently accessible as graphics memory or main store system memory without performance degradation. Because the 'appetite' for main system memory (unlike that of a display memory) is difficult to satisfy, the memory granularity problem can be addressed by programmably reallocating an unused portion of a display memory for system memory use. Reallocation of the unused display memory alleviates any need to oversize the display memory, yet realizes the cost effectiveness of using readily available memory sizes. Further, reallocation of the graphics memory avoids any need to separately consider both the system memory and the display memory in accommodating worst case operational requirements.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: The authors present the model, describe how the model supports sequential programs, message-passing programs, and shared virtual memory systems, and discuss several design issues, and show preliminary results of a prototype implementation on an Intel iPSC/860.
Abstract: A virtual memory management technique for multicomputers called memory servers is investigated. The memory server model extends the memory hierarchy of multicomputers by introducing a remote memory server layer. Memory servers are multicomputer nodes whose memory is used for fast backing storage and logically lie between the local physical memory and disks. The authors present the model, describe how the model supports sequential programs, message-passing programs, and shared virtual memory systems, discuss several design issues, and show preliminary results of a prototype implementation on an Intel iPSC/860. >