scispace - formally typeset
Search or ask a question

Showing papers on "Heap (data structure) published in 2004"


Proceedings ArticleDOI
01 Jan 2004
TL;DR: This work uses the separating conjunction to partition the internal resources of a module from those accessed by the module's clients, and gives rise to a form of dynamic partitioning, where the transfer of ownership of portions of heap storage between program components is tracked.
Abstract: We investigate proof rules for information hiding, using the recent formalism of separation logic. In essence, we use the separating conjunction to partition the internal resources of a module from those accessed by the module's clients. The use of a logical connective gives rise to a form of dynamic partitioning, where we track the transfer of ownership of portions of heap storage between program components. It also enables us to enforce separation in the presence of mutable data structures with embedded addresses that may be aliased.

288 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: Garbage-First is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability, while achieving high throughput.
Abstract: Garbage-First is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability, while achieving high throughput. Whole-heap operations, such as global marking, are performed concurrently with mutation, to prevent interruptions proportional to heap or live-data size. Concurrent marking both provides collection "completeness" and identifies regions ripe for reclamation via compacting evacuation. This evacuation is performed in parallel on multiprocessors, to increase throughput.

258 citations


Proceedings ArticleDOI
01 Jun 2004
TL;DR: This paper explores and quantifies garbage collection behavior for three whole heap collectors and generational counterparts: copying semi-space, mark-sweep, and reference counting, the canonical algorithms from which essentially all other collection algorithms are derived.
Abstract: This paper explores and quantifies garbage collection behavior for three whole heap collectors and generational counterparts: copying semi-space, mark-sweep, and reference counting, the canonical algorithms from which essentially all other collection algorithms are derived. Efficient implementations in MMTk, a Java memory management toolkit, in IBM's Jikes RVM share all common mechanisms to provide a clean experimental platform. Instrumentation separates collector and program behavior, and performance counters measure timing and memory behavior on three architectures.Our experimental design reveals key algorithmic features and how they match program characteristics to explain the direct and indirect costs of garbage collection as a function of heap size on the SPEC JVM benchmarks. For example, we find that the contiguous allocation of copying collectors attains significant locality benefits over free-list allocators. The reduced collection costs of the generational algorithms together with the locality benefit of contiguous allocation motivates a copying nursery for newly allocated objects. These benefits dominate the overheads of generational collectors compared with non-generational and no collection, disputing the myth that "no garbage collection is good garbage collection." Performance is less sensitive to the mature space collection algorithm in our benchmarks. However the locality and pointer mutation characteristics for a given program occasionally prefer copying or mark-sweep. This study is unique in its breadth of garbage collection algorithms and its depth of analysis.

248 citations


Patent
20 Feb 2004
TL;DR: In this article, the authors propose a caching mechanism for a virtual persistent heap, which divides the VPR heap into cache lines, the smallest amount of VPR space that can be loaded or flushed at one time.
Abstract: A caching mechanism for a virtual persistent heap. A feature of a virtual persistent heap is the method used to cache portions of the virtual persistent heap into the physical heap. The caching mechanism may be effective with small consumer and appliance devices that typically have a small amount of memory and that may be using flash devices as persistent storage. In the caching mechanism, the virtual persistent heap may be divided into cache lines. A cache line is the smallest amount of virtual persistent heap space that can be loaded or flushed at one time. Caching in and caching out operations are used to load cache lines into the heap or to flush dirty cache lines into the store. Different cache line sizes may be used for different regions of the heap. Translation between a virtual persistent heap address and the heap may be simplified by the caching mechanism.

111 citations


Journal ArticleDOI
TL;DR: A summary of the state of the practice of containment design in copper and gold heap leaching, focusing on recent advancements and how these applications differ from the more conventional landfill design practices is presented in this paper.

99 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: The design, implementation, and empirical evaluation of a novel JVM extension that facilitates dynamic switching between a number of very different and popular garbage collectors are presented and it is shown how to exploit this functionality using annotation-guided GC selection and evaluate the system using a large number of benchmarks.
Abstract: Much prior work has shown that the performance enabled by garbage collection (GC) systems is highly dependent upon the behavior of the application as well as on the available resources. That is, no single GC enables the best performance for all programs and all heap sizes. To address this limitation, we present the design, implementation, and empirical evaluation of a novel Java Virtual Machine (JVM) extension that facilitates dynamic switching between a number of very different and popular garbage collectors. We also show how to exploit this functionality using annotation-guided GC selection and evaluate the system using a large number of benchmarks. In addition, we implement and evaluate a simple heuristic to investigate the efficacy of switching automatically. Our results show that, on average, our annotation-guided system introduces less than 4% overhead and improves performance by 24% over the worst-performing GC (across heap sizes) and by 7% over always using the popular Generational/Mark-Sweep hybrid.

82 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: The results demonstrate that barriers impose surprisingly low cost on the mutator, though results vary by architecture and it is found that second order locality effects were sometimes more important than the overhead of the barriers themselves, leading to counter-intuitive speedups in a number of situations.
Abstract: Modern garbage collectors rely on read and write barriers imposed on heap accesses by the mutator, to keep track of references between different regions of the garbage collected heap, and to synchronize actions of the mutator with those of the collector. It has been a long-standing untested assumption that barriers impose significant overhead to garbage-collected applications. As a result, researchers have devoted effort to development of optimization approaches for elimination of unnecessary barriers, or proposed new algorithms for garbage collection that avoid the need for barriers while retaining the capability for independent collection of heap partitions. On the basis of the results presented here, we dispel the assumption that barrier overhead should be a primary motivator for such efforts.We present a methodology for precise measurement of mutator overheads for barriers associated with mutator heap accesses. We provide a taxonomy of different styles of barrier and measure the cost of a range of popular barriers used for different garbage collectors within Jikes RVM. Our results demonstrate that barriers impose surprisingly low cost on the mutator, though results vary by architecture. We found that the average overhead for a reasonable generational write barrier was less than 2% on average, and less than 6% in the worst case. Furthermore, we found that the average overhead of a read barrier consisting of just an unconditional mask of the low order bits read on the PowerPC was only 0.85%, while on the AMD it was 8.05%. With both read and write barriers, we found that second order locality effects were sometimes more important than the overhead of the barriers themselves, leading to counter-intuitive speedups in a number of situations.

80 citations


Journal ArticleDOI
TL;DR: This article derives a lazy abstract machine from an ordinary call-by-need evaluator that threads a heap of updatable cells, using closure conversion, transformation into continuation-passing style, and defunctionalization of continuations.

77 citations


Patent
18 Oct 2004
TL;DR: In this article, a low-overhead dynamic technology is proposed to improve data locality of C# applications by monitoring objects while the program runs and placing recently accessed objects on the same page(s) on the heap.
Abstract: Applications written in modern garbage collected languages like C# tend to have large dynamic working sets and poor data locality and are therefore likely to spend excess time on managing data movements between memory hierarchies. Instead, a low overhead dynamic technology improves data locality of applications. The technology monitors objects while the program runs and places recently accessed objects on the same page(s) on the heap. Providing increased page density is an effective method for reducing DTLB and/or data cache misses.

72 citations


Journal ArticleDOI
20 Sep 2004
TL;DR: A new technique is presented to statically check a given procedure against a user-provided property, which automatically infers a context-dependent specification for each procedure call, so that only as much information about a procedure is used as is needed to analyze its caller.
Abstract: A new static program analysis method for checking structural properties of code is proposed. The user need only provide a property to check; no further annotations are required. An initial abstraction of the code is computed that over-approximates the effect of function calls. This abstraction is then iteratively refined in response to spurious counterexamples. The refinement involves inferring a context-dependent specification for each function call, so that only as much information about a function is used as is necessary to analyze its caller. When the algorithm terminates, the remaining counterexample is guaranteed not to be spurious, but because the program and its heap are finitized, absence of a counterexample does not constitute proof

72 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: This work presents an automatic heap-sizing algorithm applicable to different garbage collectors with only modest changes, and shows that its adaptive heap sizing algorithm can substantially reduce running time over fixed-sized heaps.
Abstract: Heap size has a huge impact on the performance of garbage collected applications. A heap that barely meets the application's needs causes excessive GC overhead, while a heap that exceeds physical memory induces paging. Choosing the best heap size a priori is impossible in multiprogrammed environments, where physical memory allocations to processes change constantly. We present an automatic heap-sizing algorithm applicable to different garbage collectors with only modest changes. It relies on an analytical model and on detailed information from the virtual memory manager. The model characterizes the relation between collection algorithm, heap size, and footprint. The virtual memory manager tracks recent reference behavior, reporting the current footprint and allocation to the collector. The collector uses those values as inputs to its model to compute a heap size that maximizes throughput while minimizing paging. We show that our adaptive heap sizing algorithm can substantially reduce running time over fixed-sized heaps.

Journal ArticleDOI
15 Jan 2004
TL;DR: This paper presents the usage and characteristics expected of interactive workspaces, from which it is shown that the design aspects of tuplespaces, augmented with some new extensions, yield a system model, which the paper calls the Event Heap, that satisfies all of the desired properties.
Abstract: The current interest in programming models and software infrastructures to support ubiquitous and environmental computing is heightened by the falling cost of hardware and the ubiquity of local-area wireless networking technologies. Interactive workspaces are technologically augmented team-project rooms that represent a specific sub-domain of ubiquitous computing. We argue both from related work and from our own experience with a prototype that the tuplespace model of communication forms the best basis for a coordination infrastructure for such workspaces. This paper presents the usage and characteristics expected of interactive workspaces, from which we derive a set of key system properties for any coordination infrastructure in an interactive workspace. We show that the design aspects of tuplespaces, augmented with some new extensions, yield a system model, which we call the Event Heap, that satisfies all of the desired properties. We also briefly discuss why other coordination models fall short of the desired properties, and describe our experience using our implementation of the Event Heap model. The paper focuses on a justification of the use of tuplespaces in interactive workspaces, and does not provide a detailed discussion of the Event Heap implementation or our more general experience with interactive workspaces, each of which is treated in detail elsewhere.

Patent
03 Mar 2004
TL;DR: In this paper, an arrangement for using only one bit vector per heap block to improve the concurrency and parallelism of mark-sweep-compact garbage collection in a managed runtime system is provided.
Abstract: An arrangement is provided for using only one bit vector per heap block to improve the concurrency and parallelism of mark-sweep-compact garbage collection in a managed runtime system. A heap may be divided into a number of heap blocks. Each heap block has only one bit vector used for marking, compacting, and sweeping, and in that bit vector only one bit is needed per word or double word in that heap block. Both marking and sweeping phases may proceed concurrently with the execution of applications. Because all information needed for marking, compacting, and sweeping is contained in a bit vector for a heap block, multiple heap blocks may be marked, compacted, or swept in parallel through multiple garbage collection threads. Only a portion of heap blocks may be selected for compaction during each garbage collection to make the compaction incremental to reduce the disruptiveness of compaction to running applications and to achieve a fine load-balance of garbage collection process.

Patent
15 Jul 2004
TL;DR: In this article, the authors propose a method of detecting memory leaks by adaptively bursty tracing a program execution to track accesses to heap objects with low overhead and using this information identify stale heap objects, which are reported as leaks.
Abstract: A method of detecting memory leaks. The method of detecting memory leaks comprises, adaptively bursty tracing a program execution to track accesses to heap objects with low overhead and using this information identify stale heap objects, which are reported as leaks.

Proceedings ArticleDOI
01 Oct 2004
TL;DR: This work proposes a heap compaction algorithm appropriate for modern computing environments that achieves (almost) perfect compaction in the lower addresses of the heap, whereas previous algorithms achieved parallelism by compacting within several predetermined segments.
Abstract: We propose a heap compaction algorithm appropriate for modern computing environments. Our algorithm is targeted at SMP platforms. It demonstrates high scalability when running in parallel but is also extremely efficient when running single-threaded on a uniprocessor. Instead of using the standard forwarding pointer mechanism for updating pointers to moved objects, the algorithm saves information for a pack of objects. It then does a small computation to process this information and determine each object's new location. In addition, using a smart parallel moving strategy, the algorithm achieves (almost) perfect compaction in the lower addresses of the heap, whereas previous algorithms achieved parallelism by compacting within several predetermined segments. Next, we investigate a method that trades compaction quality for a further reduction in time and space overhead. Finally, we propose a modern version of the two-finger compaction algorithm. This algorithm fails, thus, re-validating traditional wisdom asserting that retaining the order of live objects significantly improves the quality of the compaction. The parallel compaction algorithm was implemented on the IBM production Java Virtual Machine. We provide measurements demonstrating high efficiency and scalability. Subsequently, this algorithm has been incorporated into the IBM production JVM.

Patent
Patrick H. Dussud1
16 Sep 2004
TL;DR: In this article, a garbage collection system and method in a multiprocessor environment having a shared memory wherein two or more processing units participate in the reclamation of garbage memory objects is presented.
Abstract: A garbage collection system and method in a multiprocessor environment having a shared memory wherein two or more processing units participate in the reclamation of garbage memory objects. The shared memory is divided into regions or heaps and all heaps are dedicated to one of the participating processing units. The processing units generally perform garbage collection operations, i.e., a thread on the heap or heaps that are dedicated to that the processing unit. However, the processing units are also allowed to access and modify other memory objects, in other heaps when those objects are referenced by and therefore may be traced back to memory objects within the processing units dedicated heap. The processors are synchronized at rendezvous points to prevent reclamation of used memory objects.

Proceedings ArticleDOI
07 Jun 2004
TL;DR: From empirical study, it is found that restriction based on escape information is often, but not always, sufficient at prohibiting the explosive nature of specialization.
Abstract: Specialization of heap objects is critical for pointer analysis to effectively analyze complex memory activity. This paper discusses heap specialization with respect to call chains. Due to the sheer number of distinct call chains, exhaustive specialization can be cumbersome. On the other hand, insufficient specialization can miss valuable opportunities to prevent spurious data flow, which results in not only reduced accuracy but also increased overhead.In determining whether further specialization will be fruitful, an object's escape information can be exploited. From empirical study, we found that restriction based on escape information is often, but not always, sufficient at prohibiting the explosive nature of specialization.For in-depth case study, four representative benchmarks are selected. For each benchmark, we vary the degree of heap specialization and examine its impact on analysis results and time. To provide better visibility into the impact, we present the points-to set and pointed-to-by set sizes in the form of histograms.

Book ChapterDOI
08 Jul 2004
TL;DR: In this paper, the authors present improved cache-oblivious data structures and algorithms for breadth-first search and the single-source shortest path problem on undirected graphs with non-negative edge weights.
Abstract: We present improved cache-oblivious data structures and algorithms for breadth-first search and the single-source shortest path problem on undirected graphs with non-negative edge weights. Our results removes the performance gap between the currently best cache-aware algorithms for these problems and their cache-oblivious counterparts. Our shortest-path algorithm relies on a new data structure, called bucket heap, which is the first cache-oblivious priority queue to efficiently support a weak DecreaseKey operation.

Journal ArticleDOI
01 Mar 2004
TL;DR: It turns out that the correspondences between data representations cannot simply be relations between states, but more intricate correspondences that also need to keep track of visible locations whose pointers can be stored and leaked.
Abstract: While the semantics of local variables in programming languages is by now well-understood, the semantics of pointer-addressed heap variables is still an outstanding issue. In particular, the commonly assumed relational reasoning principles for data representations have not been validated in a semantic model of heap variables. In this paper, we define a parametricity semantics for a Pascal-like language with pointers and heap variables which gives such reasoning principles. It turns out that the correspondences between data representations cannot simply be relations between states, but more intricate correspondences that also need to keep track of visible locations whose pointers can be stored and leaked.

DOI
01 Jan 2004
TL;DR: University of Southern Denmark 38.1 The Cache-Oblivious Model: Fundamental Primitives, kd-Tree, k-Merger, and 2d Orthogonal Range Searching.
Abstract: University of Southern Denmark 38.1 The Cache-Oblivious Model . . . . . . . . . . . . . . . . . . . . . . . . . 38-1 38.2 Fundamental Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38-3 Van Emde Boas Layout • k-Merger 38.3 Dynamic B-Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38-8 Density Based • Exponential Tree Based 38.4 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38-12 Merge Based Priority Queue: Funnel Heap • Exponential Level Based Priority Queue 38.5 2d Orthogonal Range Searching . . . . . . . . . . . . . . . . . . . . . 38-21 Cache-Oblivious kd-Tree • Cache-Oblivious Range Tree

Proceedings ArticleDOI
David F. Bacon1, Perry Cheng1, David Grove1
27 Sep 2004
TL;DR: This work has implemented two different collectors specifically designed to operate well on small embedded devices and developed a number of algorithmic improvements and compression techniques that allow them to eliminate almost all of the per-object overhead that the virtual machine and the garbage collector require.
Abstract: Security concerns on embedded devices like cellular phones make Java an extremely attractive technology for providing third-party and user-downloadable functionality. However, garbage collectors have typically required several times the maximum live data set size (which is the minimum possible heap size) in order to run well. In addition, the size of the virtual machine (ROM) image and the size of the collector's data structures (metadata) have not been a concern for server- or workstation-oriented collectors.We have implemented two different collectors specifically designed to operate well on small embedded devices. We have also developed a number of algorithmic improvements and compression techniques that allow us to eliminate almost all of the per-object overhead that the virtual machine and the garbage collector require. We describe these optimizations and present measurements of the Java embedded benchmarks (EEMBC) of our implementations on both an IA32 laptop and an ARM-based PDA.For applications with low to moderate allocation rates, our optimized collector running on the ARM is able to achieve 85% of peak performance with only 1.05 to 1.3 times the absolute minimum heap size. For applications with high allocation rates, the collector achieves 85% of peak performance with 1.75 to 2.5 times the minimum heap size. The collector code takes up 40 KB of ROM, and collector metadata overhead has been almost completely eliminated, consuming only 0.4% of the heap.

Book ChapterDOI
Radu Rugina1
26 Aug 2004
TL;DR: In this article, the authors present a static analysis that computes quantitative information for recursive heap structures in programs with destructive updates, which is able to extract quantitative information about the height and the balancing of such structures and verify the correctness of rebalancing operations after AVL tree insertions.
Abstract: This paper presents a static analysis that computes quantitative information for recursive heap structures in programs with destructive updates. The algorithm targets tree structures and is able to extract quantitative information about the height and the balancing of such structures. We formulate the algorithm as a dataflow analysis. We use a heap abstraction that captures both shape invariants and quantities of heap structures. Then, we give a precise specification of the transfer functions that describe how each statement updates this abstraction. The algorithm is able to verify the correctness of re-balancing operations after AVL tree insertions.

Patent
02 Sep 2004
TL;DR: In this paper, a method of managing a memory heap includes allocating a first portion of the memory heap to a young section, with the first portion having a faster access time than at least one of the second and third portions of the heap.
Abstract: A method of managing a memory heap includes allocating a first portion of the memory heap to a young section. The first portion having a faster access time than at least one of a second portion and a third portion of the memory heap. The second portion being allocated to a tenured section and the third portion including an unused section. The method also includes filling the young section with objects from an application and deleting any objects in the young section that are no longer referenced. Any referenced objects are shifted. A memory system is also described herein.

Patent
08 Apr 2004
TL;DR: Event Heap as mentioned in this paper is an efficient and adaptive middleware infrastructure called the Event Heap system that dynamically coordinates application interactions and communications in a ubiquitous computing environment, e.g., an interactive workspace, having heterogeneous software applications running on various machines and devices across different platforms.
Abstract: An efficient and adaptive middleware infrastructure called the Event Heap system dynamically coordinates application interactions and communications in a ubiquitous computing environment, e.g., an interactive workspace, having heterogeneous software applications running on various machines and devices across different platforms. Applications exchange events via the Event Heap. Each event is characterized by a set of unordered, named fields. Events are routed by matching certain attributes in the fields. The source and target versions of each field are automatically set when an event is posted or used as a template. The Event Heap system implements a unique combination of features, both intrinsic to tuplespaces and specific to the Event Heap, including content based addressing, support for routing patterns, standard routing fields, limited data persistence, query persistence/registration, transparent communication, self-description, flexible typing, logical/physical centralization, portable client API, at most once per source first-in-first-out ordering, and modular restartability.

Proceedings ArticleDOI
20 Mar 2004
TL;DR: This paper considers software techniques for virtual machines that allow 32-bit pointers to be used on 64-bit CPUs for managed runtime applications that do not need the full 64- bit address space.
Abstract: 64-bit processor architectures like the Intel/spl reg/ Itanium/spl reg/ processor family are designed for large applications that need large memory addresses. When running applications that fit within a 32-bit address space, 64-bit CPUs are at a disadvantage compared to 32-bit CPUs because of the larger memory footprints for their data. This results in worse cache and TLB utilization, and consequently lower performance because of increased miss ratios. This paper considers software techniques for virtual machines that allow 32-bit pointers to be used on 64-bit CPUs for managed runtime applications that do not need the full 64-bit address space. We describe our pointer compression techniques and discuss our experience implementing these for Java applications. In addition, we give performance results with our techniques for both the SPEC JVM98 and SPEC JBB2000 benchmarks. We demonstrate a 12% performance improvement on SPEC JBB2000 and a reduction in the number of garbage collections required for a given heap size.

Patent
28 Dec 2004
TL;DR: In this article, a system and method to monitor a virtual machine VM is presented, where local objects are created and stored within an internal heap maintained by the VM, and status data of the internal heap is published to monitoring memory external to the VM.
Abstract: A system and method to monitor a virtual machine VM. The VM executes one or more applications. During executing of the one or more applications, local objects are created and stored within an internal heap maintained by the VM. Status data of the internal heap is published to monitoring memory external to the VM.

Journal ArticleDOI
TL;DR: A probabilistic points-to analysis technique to compute the probability of every points- to relationship at each program point is proposed and implemented based on the iterative data flow analysis framework.
Abstract: When performing aggressive optimizations and parallelization to exploit features of advanced architectures, optimizing and parallelizing compilers need to quantitatively assess the profitability of any transformations in order to achieve high performance. Useful optimizations and parallelization can be performed if it is known that certain points-to relationships would hold with high or low probabilities. For instance, if the probabilities are low, a compiler could transform programs to perform data speculation or partition iterations into threads in speculative multithreading, or it would avoid conducting code specialization. Consequently, it is essential for compilers to incorporate pointer analysis techniques that can estimate the possibility for every points-to relationship that it would hold during the execution. However, conventional pointer analysis techniques do not provide such quantitative descriptions and, thus, hinder compilers from more aggressive optimizations, such as thread partitioning in speculative multithreading, data speculations, code specialization, etc. We address this issue by proposing a probabilistic points-to analysis technique to compute the probability of every points-to relationship at each program point. A context-sensitive interprocedural algorithm has been implemented based on the iterative data flow analysis framework, and has been incorporated into SUIF and MachSUIF. Experimental results show this technique can estimate the probabilities of points-to relationships in benchmark programs with reasonable small errors, about 4.6 percent on average. Furthermore, the current implementation cannot disambiguate heap and array elements. The errors are further significantly reduced when the future implementation incorporates techniques to disambiguate heap and array elements.

Journal ArticleDOI
TL;DR: In this paper, the authors explore and quantify garbage collection behavior for three whole heap collectors and generational counterparts: copying semi-space, mark-sweep, and reference counting, the canonical a...
Abstract: This paper explores and quantifies garbage collection behavior for three whole heap collectors and generational counterparts: copying semi-space, mark-sweep, and reference counting, the canonical a...

Patent
16 Jul 2004
TL;DR: In this article, a low overhead method for identifying memory leaks is provided, which includes detecting completion of a garbage collection cycle, identifying a boundary between used objects in memory and free memory space, and then determining if there is an existing memory leak based upon evaluation of boundary identifiers.
Abstract: A low overhead method for identifying memory leaks is provided. The low overhead method includes a) detecting completion of a garbage collection cycle; and b) identifying a boundary between used objects in memory and free memory space. The steps of a) and b) are repeated and then it is determined if there is an existing memory leak based upon evaluation of boundary identifiers. A computer readable media and a system for identifying memory leaks for an object-oriented application are also provided.

17 May 2004
TL;DR: The analysis is built on top of a combined pointer and escape analysis for Java programs and is capable of determining that methods are pure even when the methods do heap mutation, provided that the mutation affects only objects created after the beginning of the method.
Abstract: We present a new method purity analysis for Java programs. A method is pure if it does not mutate any location that exists in the program state right before method invocation. Our analysis is built on top of a combined pointer and escape analysis for Java programs and is capable of determining that methods are pure even when the methods do heap mutation, provided that the mutation affects only objects created after the beginning of the method. Because our analysis extracts a precise representation of the region of the heap that each method may access, it is able to provide useful information even for methods with externally visible side effects. In particular, it can recognize read-only parameters (a parameter is read-only if the method does not mutate any objects transitively reachable from the parameter) and safe parameters (a parameter is safe if it is read-only and the method does not create any new externally visible paths in the heap to objects transitively reachable from the parameter). The analysis can also generate regular expressions that characterize the externally visible heap locations that the method mutates. We have implemented our analysis and used it to analyze several data structure implementations. Our results show that our analysis effectively recognize a variety of pure methods, including pure methods that allocate and mutate complex auxiliary data structures. Even if the methods are not pure, our analysis can provide information which may enable developers to usefully bound the potential side effects of the method.