scispace - formally typeset
Search or ask a question

Showing papers on "Memory management published in 2002"


Proceedings Article
10 Jun 2002
TL;DR: This paper examines safety violations enabled by C’s design, and shows how Cyclone avoids them, without giving up C”s hallmark control over low-level details such as data representation and memory management.
Abstract: Cyclone is a safe dialect of C. It has been designed from the ground up to prevent the buffer overflows, format string attacks, and memory management errors that are common in C programs, while retaining C’s syntax and semantics. This paper examines safety violations enabled by C’s design, and shows how Cyclone avoids them, without giving up C’s hallmark control over low-level details such as data representation and memory management.

777 citations


Patent
Jian Chen1
13 Sep 2002
TL;DR: In this article, a flash nonvolatile memory system that normally operates its memory cells in multiple storage states is provided with the ability to operate some selected or all of its memory cell blocks in two states instead.
Abstract: A flash non-volatile memory system that normally operates its memory cells in multiple storage states is provided with the ability to operate some selected or all of its memory cell blocks in two states instead. The two states are selected to be the furthest separated of the multiple states, thereby providing an increased margin during two state operation. This allows faster programming and a longer operational life of the memory cells being operated in two states when it is more desirable to have these advantages than the increased density of data storage that multi-state operation provides.

703 citations


Journal ArticleDOI
TL;DR: Both model-based and real trace simulation studies show that the proposed cooperative architecture results in more than 50% memory saving and substantial central processing unit (CPU) power saving for the management and update of cache entries compared with the traditional uncooperative hierarchical caching architecture.
Abstract: This paper aims at finding fundamental design principles for hierarchical Web caching. An analytical modeling technique is developed to characterize an uncooperative two-level hierarchical caching system where the least recently used (LRU) algorithm is locally run at each cache. With this modeling technique, we are able to identify a characteristic time for each cache, which plays a fundamental role in understanding the caching processes. In particular, a cache can be viewed roughly as a low-pass filter with its cutoff frequency equal to the inverse of the characteristic time. Documents with access frequencies lower than this cutoff frequency have good chances to pass through the cache without cache hits. This viewpoint enables us to take any branch of the cache tree as a tandem of low-pass filters at different cutoff frequencies, which further results in the finding of two fundamental design principles. Finally, to demonstrate how to use the principles to guide the caching algorithm design, we propose a cooperative hierarchical Web caching architecture based on these principles. Both model-based and real trace simulation studies show that the proposed cooperative architecture results in more than 50% memory saving and substantial central processing unit (CPU) power saving for the management and update of cache entries compared with the traditional uncooperative hierarchical caching architecture.

512 citations


Proceedings ArticleDOI
17 May 2002
TL;DR: This paper focuses on the region-based memory management of Cyclone and its static typing discipline, and combines default annotations, local type inference, and a novel treatment of region effects to reduce this burden.
Abstract: Cyclone is a type-safe programming language derived from C. The primary design goal of Cyclone is to let programmers control data representation and memory management without sacrificing type-safety. In this paper, we focus on the region-based memory management of Cyclone and its static typing discipline. The design incorporates several advancements, including support for region subtyping and a coherent integration with stack allocation and a garbage collector. To support separate compilation, Cyclone requires programmers to write some explicit region annotations, but a combination of default annotations, local type inference, and a novel treatment of region effects reduces this burden. As a result, we integrate C idioms in a region-based framework. In our experience, porting legacy C to Cyclone has required altering about 8% of the code; of the changes, only 6% (of the 8%) were region annotations.

407 citations


Journal ArticleDOI
TL;DR: This article presents a compiler strategy that automatically partitions the data among the memory units, and shows that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data.
Abstract: This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2p reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8p reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20p of the total data size. For some programs, less than 5p of data in SRAM achieves a similar speedup.

338 citations


Proceedings ArticleDOI
01 Oct 2002
TL;DR: This work extends MMP to support segment translation which allows a memory segment to appear at another location in the address space, and uses this translation to implement zero-copy networking underneath the standard read system call interface.
Abstract: Mondrian memory protection (MMP) is a fine-grained protection scheme that allows multiple protection domains to flexibly share memory and export protected services. In contrast to earlier page-based systems, MMP allows arbitrary permissions control at the granularity of individual words. We use a compressed permissions table to reduce space overheads and employ two levels of permissions caching to reduce run-time overheads. The protection tables in our implementation add less than 9% overhead to the memory space used by the application. Accessing the protection tables adds than 8% additional memory references to the accesses made by the application. Although it can be layered on top of demand-paged virtual memory, MMP is also well-suited to embedded systems with a single physical address space. We extend MMP to support segment translation which allows a memory segment to appear at another location in the address space. We use this translation to implement zero-copy networking underneath the standard read system call interface, where packet payload fragments are connected together by the translation system to avoid data copying. This saves 52% of the memory references used by a traditional copying network stack.

322 citations


Journal ArticleDOI
TL;DR: This work explores the data reuse properties of full-search block-matching for motion estimation (ME) and associated architecture designs, as well as memory bandwidth requirements, and a seven-type classification system is developed that can accommodate most published ME architectures.
Abstract: This work explores the data reuse properties of full-search block-matching (FSBM) for motion estimation (ME) and associated architecture designs, as well as memory bandwidth requirements. Memory bandwidth in high-quality video is a major bottleneck to designing an implementable architecture because of large frame size and search range. First, the memory bandwidth in ME is analyzed and the problem is solved by exploring data reuse. Four levels are defined according to the degree of data reuse for previous frame access. With the highest level of data reuse, one-access for frame pixels is achieved. A scheduling strategy is also applied to data reuse of the ME architecture designs and a seven-type classification system is developed that can accommodate most published ME architectures. This classification can simplify the work of designers in designing more cost-effective ME architectures, while simultaneously minimizing memory bandwidth. Finally, a FSBM architecture suitable for high quality HDTV video with a minimum memory bandwidth feature is proposed. Our architecture is able to achieve 100% hardware efficiency while preserving minimum I/O pin count, low local memory size, and bandwidth.

308 citations


Patent
14 Mar 2002
TL;DR: In this paper, a smart memory computing system that uses smart memory for massive data storage as well as for massive parallel execution is described, where the data stored in the smart memory can be accessed just like the conventional main memory, but the execution units also have many execution units to process data in situ.
Abstract: A smart memory computing system that uses smart memory for massive data storage as well as for massive parallel execution is disclosed. The data stored in the smart memory can be accessed just like the conventional main memory, but the smart memory also has many execution units to process data in situ. The smart memory computing system offers improved performance and reduced costs for those programs having massive data-level parallelism. This smart memory computing system is able to take advantage of data-level parallelism to improve execution speed by, for example, use of inventive aspects such as algorithm mapping, compiler techniques, architecture features, and specialized instruction sets.

196 citations


Proceedings ArticleDOI
04 Nov 2002
TL;DR: The results indicate that programmers needing fast regions should use reaps, and that most programmers considering custom allocators should instead use the Lea allocator.
Abstract: Programmers hoping to achieve performance improvements often use custom memory allocators This in-depth study examines eight applications that use custom allocators Surprisingly, for six of these applications, a state-of-the-art general-purpose allocator (the Lea allocator) performs as well as or better than the custom allocators The two exceptions use regions, which deliver higher performance (improvements of up to 44%) Regions also reduce programmer burden and eliminate a source of memory leaks However, we show that the inability of programmers to free individual objects within regions can lead to a substantial increase in memory consumption Worse, this limitation precludes the use of regions for common programming idioms, reducing their usefulnessWe present a generalization of general-purpose and region-based allocators that we call reaps Reaps are a combination of regions and heaps, providing a full range of region semantics with the addition of individual object deletion We show that our implementation of reaps provides high performance, outperforming other allocators with region-like semantics We then use a case study to demonstrate the space advantages and software engineering benefits of reaps in practice Our results indicate that programmers needing fast regions should use reaps, and that most programmers considering custom allocators should instead use the Lea allocator

194 citations


Proceedings ArticleDOI
01 Oct 2002
TL;DR: Content-Directed Data Prefetching is proposed, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems.
Abstract: Although central processor speeds continues to improve, improvements in overall system performance are increasingly hampered by memory latency, especially for pointer-intensive applications. To counter this loss of performance, numerous data and instruction prefetch mechanisms have been proposed. Recently, several proposals have posited a memory-side prefetcher; typically, these prefetchers involve a distinct processor that executes a program slice that would effectively prefetch data needed by the primary program. Alternative designs embody large state tables that learn the miss reference behavior of the processor and attempt to prefetch likely misses.This paper proposes Content-Directed Data Prefetching, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems. This technique is modeled after conservative garbage collection, and prefetches "likely" virtual addresses observed in memory references. This prefetching mechanism uses the underlying data of the application, and provides an 11.3% speedup using no additional processor state. By adding less than ½% space overhead to the second level cache, performance can be further increased to 12.6% across a range of "real world" applications.

175 citations


Proceedings ArticleDOI
10 Jun 2002
TL;DR: This work presents an operating system (OS) based solution where the OS scheduler directs the power mode transitions by keeping track of module accesses for each process in the system and shows that the proposed technique is also very robust when different system and workload parameters are modified.
Abstract: Previous work on DRAM power-mode management focused on hardware-based techniques and compiler-directed schemes to explicitly transition unused memory modules to low-power operating modes. While hardware-based techniques require extra logic to keep track of memory references and make decisions about future mode transitions, compiler-directed schemes can only work on a single application at a time and demand sophisticated program analysis support. In this work, we present an operating system (OS) based solution where the OS scheduler directs the power mode transitions by keeping track of module accesses for each process in the system. This global view combined with the flexibility of a software approach brings large energy savings at no extra hardware cost. Our implementation using a full-fledged OS shows that the proposed technique is also very robust when different system and workload parameters are modified, and provides the first set of experimental results for memory energy optimization with a multiprogrammed workload on a real platform. The proposed technique is applicable to both embedded systems and high-end computing platforms.

Patent
John Garney1
08 Aug 2002
TL;DR: In this paper, a write-back mechanism, which may employ security, is employed to enforce usage restrictions, such as an expiration date, usage count limit or data access fee for the acquired data.
Abstract: A destructive-read memory is one that the process of reading the memory causes the contents of the memory to be destroyed. Such a memory may be used in devices that are intended to acquire data that may have associated usage restrictions, such as an expiration date, usage count limit, or data access fee for the acquired data. Typically, to enforce usage restrictions, and protect against theft, complex and often costly security techniques are applied to acquired data. With destructive-read memory, complex and costly security is not required for stored data. In one embodiment, a write-back mechanism, which may employ security, is responsible for enforcing usage restrictions. If the write-back mechanism determines continued access to acquired data is allowed, then it writes back the data as it is destructively read from the memory.

Proceedings ArticleDOI
10 Jun 2002
TL;DR: This paper shows that a simple reuse vector/matrix abstraction can provide compiler with useful information in a concise form and indicates that the compiler is very successful in both optimizing code for a given memory hierarchy and designing a hierarchy with reasonable performance/size ratio.
Abstract: One of the primary challenges in embedded system design is designing the memory hierarchy and restructuring the application to take advantage of it. This task is particularly important for embedded image and video processing applications that make heavy use of large multi-dimensional arrays of signals and nested loops. In this paper, we show that a simple reuse vector/matrix abstraction can provide compiler with useful information in a concise form. Using this information, compiler can either adapt application to an existing memory hierarchy or can come up with a memory hierarchy. Our initial results indicate that the compiler is very successful in both optimizing code for a given memory hierarchy and designing a hierarchy with reasonable performance/size ratio.

Patent
05 Sep 2002
TL;DR: In this article, a distributed data processing system for memory management is described, where memory regions are registered and have access rights and protection domains associated with them in response to receiving a request for a memory operation including a virtual address, which is used to address into a data structure.
Abstract: A method, computer program product, and distributed data processing system for memory management. Memory regions are registered and have access rights and Protection domains associated with them in response to receiving a request for a memory operation including a virtual address, which is used to address into a data structure. A second data structure is then used to translate the virtual address into physical addresses for the operation. A third data structure is used to allow an incoming request responsive to a remote operation being initiated.

Patent
27 Sep 2002
TL;DR: The logical grouping of memory system sectors in a non-volatile memory system in order to increase the operational speed of the memory system has been proposed in this article, which includes allocating sets of contiguous logical sectors containing file data from a host system into logical groups.
Abstract: An embodiment of the present invention includes a method of implementing the logical grouping of memory system sectors in a non-volatile memory system in order to increase the operational speed of the memory system, the method comprising allocating sets of contiguous logical sectors containing file data from a host system into logical groups; ensuring that a logical group includes fewer sectors than there are sector locations in a memory block in the non-volatile memory; aligning the logical groups with the clusters into which the host system organizes sectors containing file data; writing sectors within a logical group to contiguous locations within the non-volatile memory; organizing the on-volatile memory such that the corresponding sector in each logical group is written to a corresponding array within the memory; the arrangement being such that the reading then writing of a sector of a cluster to relocate it to a different location in the non-volatile memory takes place within the same array, thereby allowing concurrent relocation of all sectors in a logical group.

Proceedings ArticleDOI
02 Oct 2002
TL;DR: In this paper, a new software technique is presented which supports the use of an onchip scratchpad memory by dynamically copying program parts into it with an optimal algorithm using integer linear programming.
Abstract: The number of mobile embedded systems is increasing and all of them are limited in their uptime by their battery capacity. Several hardware changes have been introduced during the last years, but the steadily growing functionality still requires further energy reductions, e.g. through software optimizations. A significant amount of energy can be saved in the memory hierarchy where most of the energy is consumed. In this paper, a new software technique is presented which supports the use of an onchip scratchpad memory by dynamically copying program parts into it. The set of selected program parts are determined with an optimal algorithm using integer linear programming. Experimental results show a reduction of the energy consumption by nearly 30%, a performance increase by 25% against a common cache system and energy improvements against a static approach of up to 38%.

Patent
13 Nov 2002
TL;DR: In this article, a portable communication device may have multiple processors and a memory, and some portions of the memory may only be accessible by one of the processors, while others are accessible by all the processors.
Abstract: Briefly, in accordance with one embodiment of the invention, a portable communication device may have multiple processors and a memory. Portions of the memory may only be accessible by one of the processors.

Patent
04 Dec 2002
TL;DR: In this paper, the authors proposed a shared memory architecture for a GPS receiver, wherein a processing memory is shared among the different processing functions, such as the correlator signal processing, tracking processing, and other applications processing.
Abstract: A shared memory architecture for a GPS receiver, wherein a processing memory is shared among the different processing functions, such as the correlator signal processing, tracking processing, and other applications processing. The shared memory architecture within the GPS receiver provides the memory necessary for signal processing operations, such as the massively parallel processing, while conserving memory cost by re-using that same memory for other GPS and non-GPS applications. The shared memory architecture for a GPS receiver provided in accordance with the principles of this invention thereby significantly minimize the costly memory requirement often required of extremely fast signal acquisition of a GPS receiver.

Patent
Opher D. Kahn1, Jeffrey R Wilcox1
03 Jan 2002
TL;DR: In this paper, the authors propose a method for dynamically adjusting a memory page-closing policy for computer systems employing various types of DRAM memory partitioned into one or more memory banks.
Abstract: A method for dynamically adjusting a memory page-closing policy for computer systems employing various types of DRAM memory partitioned into one or more memory banks, and circuitry for implementing the method. In general, the method comprises monitoring memory accesses to memory banks and dynamically adjusting the memory page closing policy for those memory bank based on locality characteristics of its memory accesses so that memory latencies are reduced. In one embodiment, in response to memory requests from a computer system processor, memory accesses to the DRAM memory are made on a page-wise basis. As each memory page is accessed, a page-miss, page-hit or page-hit state is produced. Depending on the page access states, which generally will reflect the locality characteristics of (an) application(s) accessing the memory, a page-close set point is adjusted. When a timing count corresponding to the page exceeds the page-close set point, the memory page is closed.

Patent
James Chow1, Thomas K. Gender1
03 Jun 2002
TL;DR: In this paper, the authors present a flash memory management system and method with increased performance, which includes a free block mechanism, a disk maintenance mechanism, and a bad block detection mechanism.
Abstract: The present invention provides a flash memory management system and method with increased performance. The flash memory management system provides the ability to efficiently manage and allocate flash memory use in a way that improves reliability and longevity, while maintaining good performance levels. The flash memory management system includes a free block mechanism, a disk maintenance mechanism, and a bad block detection mechanism. The free block mechanism provides efficient sorting of free blocks to facilitate selecting low use blocks for writing. The disk maintenance mechanism provides for the ability to efficiently clean flash memory blocks during processor idle times. The bad block detection mechanism provides the ability to better detect when a block of flash memory is likely to go bad. The flash status mechanism stores information in fast access memory that describes the content and status of the data in the flash disk. The new bank detection mechanism provides the ability to automatically detect when new banks of flash memory are added to the system. Together, these mechanisms provide a flash memory management system that can improve the operational efficiency of systems that utilize flash memory.

Proceedings ArticleDOI
20 Jun 2002
TL;DR: Measurements of thread-local heaps with direct global allocation on a 4-way multiprocessor IBM Netfinity server show that the overall garbage collection times have been substantially reduced, and that most long pauses have been eliminated.
Abstract: We present a memory management scheme for Java based on thread-local heaps. Assuming most objects are created and used by a single thread, it is desirable to free the memory manager from redundant synchronization for thread-local objects. Therefore, in our scheme each thread receives a partition of the heap in which it allocates its objects and in which it does local garbage collection without synchronization with other threads. We dynamically monitor to determine which objects are local and which are global. Furthermore, we suggest using profiling to identify allocation sites that almost exclusively allocate global objects, and allocate objects at these sites directly in a global area.We have implemented the thread-local heap memory manager and a preliminary mechanism for direct global allocation on an IBM prototype of JDK 1.3.0 for Windows. Our measurements of thread-local heaps with direct global allocation on a 4-way multiprocessor IBM Netfinity server show that the overall garbage collection times have been substantially reduced, and that most long pauses have been eliminated.

Proceedings ArticleDOI
01 Jan 2002
TL;DR: This work investigates the complexity of finding the optimal placement of objects (or code) in the memory, in the sense that this placement reduces the cache misses to the minimum, and shows that this problem is one of the toughest amongst the interesting algorithmic problems in computer science.
Abstract: The growing gap between the speed of memory access and cache access has made cache misses an influential factor in program efficiency. Much effort has been spent recently on reducing the number of cache misses during program run. This effort includes wise rearranging of program code, cache-conscious data placement, and algorithmic modifications that improve the program cache behavior. In this work we investigate the complexity of finding the optimal placement of objects (or code) in the memory, in the sense that this placement reduces the cache misses to the minimum. We show that this problem is one of the toughest amongst the interesting algorithmic problems in computer science. In particular, suppose one is given a sequence of memory accesses and one has to place the data in the memory so as to minimize the number of cache misses for this sequence. We show that if P ≠ NP, then one cannot efficiently approximate the optimal solution even up to a very liberal approximation ratio. Thus, this problem joins the small family of extremely inapproximable optimization problems. The other two famous members in this family are minimum coloring and maximum clique.

Proceedings ArticleDOI
01 Jan 2002
TL;DR: In this article, the authors formalize a general problem of analyzing resource usage as a resource usage analysis problem, and propose a type-based method as a solution to the problem.
Abstract: It is an important criterion of program correctness that a program accesses resources in a valid manner. For example, a memory region that has been allocated should be eventually deallocated, and after the deallocation, the region should no longer be accessed. A file that has been opened should be eventually closed. So far, most of the methods to analyze this kind of property have been proposed in rather specific contexts (like studies of memory management and verification of usage of lock primitives), and it was not so clear what is the essence of those methods or how methods proposed for individual problems are related. To remedy this situation, we formalize a general problem of analyzing resource usage as a resource usage analysis problem, and propose a type-based method as a solution to the problem.

Book ChapterDOI
17 Sep 2002
TL;DR: The design and implementation of an efficient inclusion-based points-to analysis for strictly-typed object-oriented languages and the experimental results demonstrate that this technique can be used to compute precise static call graphs for very large Java programs.
Abstract: We describe the design and implementation of an efficient inclusion-based points-to analysis for strictly-typed object-oriented languages. Our implementation easily scales to millions of lines of Java code, and it supports language features suchas inheritance, object fields, exceptional control flow, type casting, dynamic dispatch, and reflection. Our algorithm is based on Heintze and Tardieu's Andersen-style pointsto analysis designed originally for C programs. We have improved the precision of their algorithm by tracking the fields of individual objects separately and by analyzing the local variables in a method in a flow-sensitive manner. Our algorithm represents the semantics of each procedure concisely using a sparse summary graphrepresen tation based on access paths; it iterates over this sparse representation until it reaches a fixed point solution. By utilizing the access path and field information present in the summary graphs, along with minimizing redundant operations and memory management overheads, we are able to quickly and effectively analyze very large programs. Our experimental results demonstrate that this technique can be used to compute precise static call graphs for very large Java programs.

Patent
31 Dec 2002
Abstract: A back up power embodied non-volatile memory device comprising a connection port, a power supply unit and a memory system. A host machine provides data and power to the connection port through an external bus. The memory system holds the data received from the connection port temporarily and transfers the data to a non- volatile memory unit inside the memory system. The power supply unit provides necessary power to complete the transfer of temporarily stored data inside the memory system to the non-volatile memory unit to become readable data when host power suddenly fails.

Proceedings ArticleDOI
17 May 2002
TL;DR: Measurements show that for a variety of benchmark programs, code generated by the compiler is as efficient, both with respect to execution time and memory usage, as programs compiled with Standard ML of New Jersey, another state-of-the-art Standard ML compiler.
Abstract: This paper describes a memory discipline that combines region-based memory management and copying garbage collection by extending Cheney's copying garbage collection algorithm to work with regions. The paper presents empirical evidence that region inference very significantly reduces the number of garbage collections; and evidence that the fastest execution is obtained by using regions alone, without garbage collection. The memory discipline is implemented for Standard ML in the ML Kit compiler and measurements show that for a variety of benchmark programs, code generated by the compiler is as efficient, both with respect to execution time and memory usage, as programs compiled with Standard ML of New Jersey, another state-of-the-art Standard ML compiler.

Proceedings ArticleDOI
10 Jun 2002
TL;DR: An automatic data migration strategy which dynamically places the arrays with temporal affinity into the same set of banks is described which increases the number of banks which can be put into low-power modes and allows the use of more aggressive energy-saving modes.
Abstract: An architectural solution to reducing memory energy consumption is to adopt a multi-bank memory system instead of a monolithic (single-bank) memory system. Some recent multi-bank memory architectures help reduce memory energy by allowing an unused bank to be placed into a low-power operating mode. This paper describes an automatic data migration strategy which dynamically places the arrays with temporal affinity into the same set of banks. This strategy increases the number of banks which can be put into low-power modes and allows the use of more aggressive energy-saving modes. Experiments using several array-dominated applications show the usefulness of data migration and indicate that large energy savings can be achieved with low overhead.

Patent
Jong-Deok Choi1, Keunwoo Lee1, Robert O'Callahan1, Vivek Sarkar1, Manu Sridharan1 
25 Jun 2002
TL;DR: In this article, a method of detecting datarace between first and second memory accesses within a program was proposed, which can detect whether the two accesses are to the same memory location and whether they are executed by different threads.
Abstract: A method of detecting a datarace between first and second memory accesses within a program, including: determining whether the first and second memory accesses are to the same memory location; determining whether the first and second memory accesses are executed by different threads in the program; determining whether the first and second memory accesses are guarded by a common synchronization object; and determining whether there is an execution ordering enforced between the first and second memory accesses.

Journal ArticleDOI
TL;DR: To make the DIRECT global optimization algorithm efficient and robust on large-scale, multidisciplinary engineering problems, a set of dynamic data structures is proposed here to balance the memory requirements with execution time, while simultaneously adapting to arbitrary problem size.
Abstract: The DIRECT (DIviding RECTangles) algorithm of Jones, Perttunen, and Stuckman (Journal of Optimization Theory and Applications, vol. 79, no. 1, pp. 157–181, 1993), a variant of Lipschitzian methods for bound constrained global optimization, has proved effective even in higher dimensions. However, the performance of a DIRECT implementation in real applications depends on the characteristics of the objective function, the problem dimension, and the desired solution accuracy. Implementations with static data structures often fail in practice, since it is difficult to predict memory resource requirements in advance. This is especially critical in multidisciplinary engineering design applications, where the DIRECT optimization is just one small component of a much larger computation, and any component failure aborts the entire design process. To make the DIRECT global optimization algorithm efficient and robust on large-scale, multidisciplinary engineering problems, a set of dynamic data structures is proposed here to balance the memory requirements with execution time, while simultaneously adapting to arbitrary problem size. The focus of this paper is on design issues of the dynamic data structures, and related memory management strategies. Numerical computing techniques and modifications of Jones' original DIRECT algorithm in terms of stopping rules and box selection rules are also explored. Performance studies are done for synthetic test problems with multiple local optima. Results for application to a site-specific system simulator for wireless communications systems (S4W) are also presented to demonstrate the effectiveness of the proposed dynamic data structures for an implementation of DIRECT.

Proceedings ArticleDOI
01 Jan 2002
TL;DR: A new type-based approach to garbage collection that has similar attributes but lower cost than generational collection is presented, and the short type pointer technique for reducing memory requirements of objects (data) used by the program is described.
Abstract: In this paper, we introduce the notion of prolific and non-prolific types, based on the number of instantiated objects of those types. We demonstrate that distinguishing between these types enables a new class of techniques for memory management and data locality, and facilitates the deployment of known techniques. Specifically, we first present a new type-based approach to garbage collection that has similar attributes but lower cost than generational collection. Then we describe the short type pointer technique for reducing memory requirements of objects (data) used by the program. We also discuss techniques to facilitate the recycling of prolific objects and to simplify object co-allocation decisions.We evaluate the first two techniques on a standard set of Java benchmarks (SPECjvm98 and SPECjbb2000). An implementation of the type-based collector in the Jalapeno VM shows improved pause times, elimination of unnecessary write barriers, and reduction in garbage collection time (compared to the analogous generational collector) by up to 15%. A study to evaluate the benefits of the short-type pointer technique shows a potential reduction in the heap space requirements of programs by up to 16%.