scispace - formally typeset
Search or ask a question
Topic

MLton

About: MLton is a research topic. Over the lifetime, 32 publications have been published within this topic receiving 433 citations.

Papers
More filters
Proceedings ArticleDOI
04 Sep 2016
TL;DR: A technique for parallel memory management for strict functional languages with nested parallelism, which consists of a technique to organize memory as a hierarchy of heaps, and an algorithm for performing automatic memory reclamation by taking advantage of a disentanglement property of parallel functional programs.
Abstract: An important feature of functional programs is that they are parallel by default. Implementing an efficient parallel functional language, however, is a major challenge, in part because the high rate of allocation and freeing associated with functional programs requires an efficient and scalable memory manager. In this paper, we present a technique for parallel memory management for strict functional languages with nested parallelism. At the highest level of abstraction, the approach consists of a technique to organize memory as a hierarchy of heaps, and an algorithm for performing automatic memory reclamation by taking advantage of a disentanglement property of parallel functional programs. More specifically, the idea is to assign to each parallel task its own heap in memory and organize the heaps in a hierarchy/tree that mirrors the hierarchy of tasks. We present a nested-parallel calculus that specifies hierarchical heaps and prove in this calculus a disentanglement property, which prohibits a task from accessing objects allocated by another task that might execute in parallel. Leveraging the disentanglement property, we present a garbage collection technique that can operate on any subtree in the memory hierarchy concurrently as other tasks (and/or other collections) proceed in parallel. We prove the safety of this collector by formalizing it in the context of our parallel calculus. In addition, we describe how the proposed techniques can be implemented on modern shared-memory machines and present a prototype implementation as an extension to MLton, a high-performance compiler for the Standard ML language. Finally, we evaluate the performance of this implementation on a number of parallel benchmarks.

31 citations

Journal ArticleDOI
TL;DR: The rationale, design, and implementation of MultiMLton are described, and experimental results over a range of parallel benchmarks and different multicore architectures including an 864 core Azul Vega 3, and a 48 core non-coherent Intel SCC (Single-Cloud Computer), that justify the design decisions are provided.
Abstract: MULTIMLTON is an extension of the MLton compiler and runtime system that targets scalable, multicore architectures. It provides specific support for ACML, a derivative of Concurrent ML that allows for the construction of composable asynchronous events. To effectively manage asynchrony, we require the runtime to efficiently handle potentially large numbers of lightweight, short-lived threads, many of which are created specifically to deal with the implicit concurrency introduced by asynchronous events. Scalability demands also dictate that the runtime minimize global coordination. MULTIMLTON therefore implements a split-heap memory manager that allows mutators and collectors running on different cores to operate mostly independently. More significantly, MULTIMLTON exploits the premise that there is a surfeit of available concurrency in ACML programs to realize a new collector design that completely eliminates the need for read barriers, a source of significant overhead in other managed runtimes. These two symbiotic features - a thread design specifically tailored to support asynchronous communication, and a memory manager that exploits lightweight concurrency to greatly reduce barrier overheads - are MULTIMLTON’s key novelties. In this article, we describe the rationale, design, and implementation of these features, and provide experimental results over a range of parallel benchmarks and different multicore architectures including an 864 core Azul Vega 3, and a 48 core non-coherent Intel SCC (Single-Cloud Computer), that justify our design decisions.

29 citations

Journal ArticleDOI
20 Sep 2008
TL;DR: The proposed compilation technique can produce self-adjusting programs whose performance is consistent with the asymptotic bounds and experimental results obtained via manual rewriting (up to a constant factor) and is validated by extending Standard ML and by integrating the transformation into MLton, a whole-program optimizing compiler for Standard ML.
Abstract: Self-adjusting programs respond automatically and efficiently to input changes by tracking the dynamic data dependences of the computation and incrementally updating the output as needed. In order to identify data dependences, previously proposed approaches require the user to make use of a set of monadic primitives. Rewriting an ordinary program into a self-adjusting program with these primitives, however, can be difficult and error-prone due to various monadic and proper-usage restrictions, some of which cannot be enforced statically. Previous work therefore suggests that self-adjusting computation would benefit from direct language and compiler support.In this paper, we propose a language-based technique for writing and compiling self-adjusting programs from ordinary programs. To compile self-adjusting programs, we use a continuation-passing style (cps) transformation to automatically infer a conservative approximation of the dynamic data dependences. To prevent the inferred, approximate dependences from degrading the performance of change propagation, we generate memoized versions of cps functions that can reuse previous work even when they are invoked with different continuations. The approach offers a natural programming style that requires minimal changes to existing code, while statically enforcing the invariants required by self-adjusting computation.We validate the feasibility of our proposal by extending Standard ML and by integrating the transformation into MLton, a whole-program optimizing compiler for Standard ML. Our experiments indicate that the proposed compilation technique can produce self-adjusting programs whose performance is consistent with the asymptotic bounds and experimental results obtained via manual rewriting (up to a constant factor).

26 citations

Journal ArticleDOI
20 Dec 2019
TL;DR: This paper identifies a memory property, called disentanglement, of nested-parallel programs, and proposes memory management techniques that take advantage of disENTanglement for improved efficiency and scalability and shows that these techniques are practical by extending the MLton compiler for Standard ML to support this form of nested parallelism.
Abstract: Nested parallelism has proved to be a popular approach for programming the rapidly expanding range of multicore computers. It allows programmers to express parallelism at a high level and relies on a run-time system and a scheduler to deliver efficiency and scalability. As a result, many programming languages and extensions that support nested parallelism have been developed, including in C/C++, Java, Haskell, and ML. Yet, writing efficient and scalable nested parallel programs remains challenging, primarily due to difficult concurrency bugs arising from destructive updates or effects. For decades, researchers have argued that functional programming can simplify writing parallel programs by allowing more control over effects but functional programs continue to underperform in comparison to parallel programs written in lower-level languages. The fundamental difficulty with functional languages is that they have high demand for memory, and this demand only grows with parallelism. In this paper, we identify a memory property, called disentanglement, of nested-parallel programs, and propose memory management techniques for improved efficiency and scalability. Disentanglement allows for (destructive) effects as long as concurrently executing threads do not gain knowledge of the memory objects allocated by each other. We formally define disentanglement by considering an ML-like higher-order language with mutable references and presenting a dynamic semantics for it that enables reasoning about computation graphs of nested parallel programs. Based on this graph semantics, we formalize a classic correctness property---determinacy race freedom---and prove that it implies disentanglement. This establishes that disentanglement applies to a relatively broad class of parallel programs. We then propose memory management techniques for nested-parallel programs that take advantage of disentanglement for improved efficiency and scalability. We show that these techniques are practical by extending the MLton compiler for Standard ML to support this form of nested parallelism. Our empirical evaluation shows that our techniques are efficient and scale well.

22 citations

Proceedings ArticleDOI
10 Feb 2018
TL;DR: This paper couple the memory manager with the thread scheduler to make reading and updating data allocated by nested threads efficient, and proposes techniques for efficient mutation in parallel functional languages.
Abstract: It is well known that modern functional programming languages are naturally amenable to parallel programming. Achieving efficient parallelism using functional languages, however, remains difficult. Perhaps the most important reason for this is their lack of support for efficient in-place updates, i.e., mutation, which is important for the implementation of both parallel algorithms and the run-time system services (e.g., schedulers and synchronization primitives) used to execute them. In this paper, we propose techniques for efficient mutation in parallel functional languages. To this end, we couple the memory manager with the thread scheduler to make reading and updating data allocated by nested threads efficient. We describe the key algorithms behind our technique, implement them in the MLton Standard ML compiler, and present an empirical evaluation. Our experiments show that the approach performs well, significantly improving efficiency over existing functional language implementations.

17 citations


Network Information
Related Topics (5)
Concurrency
13K papers, 347.1K citations
85% related
Compiler
26.3K papers, 578.5K citations
84% related
Data structure
28.1K papers, 608.6K citations
83% related
Executable
24K papers, 391.1K citations
82% related
Logic programming
11.1K papers, 274.2K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20213
20202
20191
20183
20164
20141