scispace - formally typeset
Search or ask a question
Author

Gabriel Poesia

Bio: Gabriel Poesia is an academic researcher from Stanford University. The author has contributed to research in topics: Natural language & Call graph. The author has an hindex of 2, co-authored 6 publications receiving 18 citations. Previous affiliations of Gabriel Poesia include Universidade Federal de Minas Gerais.

Papers
More filters
Journal ArticleDOI
12 Oct 2017
TL;DR: Etino as mentioned in this paper analyzes the program's call graph to determine the best processor for each calling context, which is parameterized by a cost model, which takes into consideration processor's characteristics and data transfer time.
Abstract: Heterogeneous architectures characterize today hardware ranging from super-computers to smartphones. However, in spite of this importance, programming such systems is still challenging. In particular, it is challenging to map computations to the different processors of a heterogeneous device. In this paper, we provide a static analysis that mitigates this problem. Our contributions are two-fold: first, we provide a semi-context-sensitive algorithm, which analyzes the program's call graph to determine the best processor for each calling context. This algorithm is parameterized by a cost model, which takes into consideration processor's characteristics and data transfer time. Second, we show how to use simulated annealing to calibrate this cost model for a given heterogeneous architecture. We have used our ideas to build Etino, a tool that annotates C programs with OpenACC or OpenMP 4.0 directives. Etino generates code for a CPU-GPU architecture without user intervention. Experiments on classic benchmarks reveal speedups of up to 75x. Moreover, our calibration process lets avoid slowdowns of up to 720x which trivial parallelization approaches would yield.

17 citations

Journal ArticleDOI
13 Nov 2020
TL;DR: This paper combines finite state machines and dynamic dispatching to allow fully context-sensitive specialization while cloning only functions that are effectively optimized, which makes it possible to apply very liberal optimizations, such as context- sensitive constant propagation, in large programs—something that could not have been easily done before.
Abstract: Academia has spent much effort into making context-sensitive analyses practical, with great profit. However, the implementation of context-sensitive optimizations, in contrast to analyses, is still not practical, due to code-size explosion. This growth happens because current technology requires the cloning of full paths in the Calling Context Tree. In this paper, we present a solution to this problem. We combine finite state machines and dynamic dispatching to allow fully context-sensitive specialization while cloning only functions that are effectively optimized. This technique makes it possible to apply very liberal optimizations, such as context-sensitive constant propagation, in large programs—something that could not have been easily done before. We demonstrate the viability of our idea by formalizing it in Prolog, and implementing it in LLVM. As a proof of concept, we have used our state machines to implement context-sensitive constant propagation in LLVM. The binaries produced by traditional full cloning are 2.63 times larger than the binaries that we generate with our state machines. When applied on Mozilla Firefox, our optimization increases binary size from 7.2MB to 9.2MB. Full cloning, in contrast, yields a binary of 34MB.

3 citations

Book ChapterDOI
15 Sep 2014
TL;DR: Experiments on two real-life datasets show that the pre-processing step to identify and erase n-tuples whose removal does not change the collection of patterns to be discovered allows to lower the overall running time by a factor typically ranging from 10 to 100.
Abstract: Given a binary relation, listing the itemsets takes exponential time. The problem grows worse when searching for analog patterns defined in n-ary relations. However, real-life relations are sparse and, with a greater number n of dimensions, they tend to be even sparser. Moreover, not all itemsets are searched. Only those satisfying some userdefined constraints, such as minimal size constraints. This article proposes to exploit together the sparsity of the relation and the presence of constraints satisfying a common property, the monotonicity w.r.t. one dimension. It details a pre-processing step to identify and erase n-tuples whose removal does not change the collection of patterns to be discovered. That reduction of the relation is achieved in a time and a space that is linear in the number of n-tuples. Experiments on two real-life datasets show that, whatever the algorithm used afterward to actually list the patterns, the pre-process allows to lower the overall running time by a factor typically ranging from 10 to 100.

2 citations

Posted Content
TL;DR: This article proposed a question-asking model capable of producing polar (yes-no) clarification questions to resolve misunderstandings in dialogue and demonstrated their model's ability to pose questions that improve communicative success in a goal-oriented 20 questions game with synthetic and human answerers.
Abstract: An overarching goal of natural language processing is to enable machines to communicate seamlessly with humans. However, natural language can be ambiguous or unclear. In cases of uncertainty, humans engage in an interactive process known as repair: asking questions and seeking clarification until their uncertainty is resolved. We propose a framework for building a visually grounded question-asking model capable of producing polar (yes-no) clarification questions to resolve misunderstandings in dialogue. Our model uses an expected information gain objective to derive informative questions from an off-the-shelf image captioner without requiring any supervised question-answer data. We demonstrate our model's ability to pose questions that improve communicative success in a goal-oriented 20 questions game with synthetic and human answerers.

2 citations

Posted Content
TL;DR: This paper proposed Contrastive Policy Learning (ConPoLe) that explicitly optimizes the InfoNCE loss, which lower bounds the mutual information between the current state and next states that continue on a path to the solution.
Abstract: symbolic reasoning, as required in domains such as mathematics and logic, is a key component of human intelligence. Solvers for these domains have important applications, especially to computer-assisted education. But learning to solve symbolic problems is challenging for machine learning algorithms. Existing models either learn from human solutions or use hand-engineered features, making them expensive to apply in new domains. In this paper, we instead consider symbolic domains as simple environments where states and actions are given as unstructured text, and binary rewards indicate whether a problem is solved. This flexible setup makes it easy to specify new domains, but search and planning become challenging. We introduce four environments inspired by the Mathematics Common Core Curriculum, and observe that existing Reinforcement Learning baselines perform poorly. We then present a novel learning algorithm, Contrastive Policy Learning (ConPoLe) that explicitly optimizes the InfoNCE loss, which lower bounds the mutual information between the current state and next states that continue on a path to the solution. ConPoLe successfully solves all four domains. Moreover, problem representations learned by ConPoLe enable accurate prediction of the categories of problems in a real mathematics curriculum. Our results suggest new directions for reinforcement learning in symbolic domains, as well as applications to mathematics education.

Cited by
More filters
Proceedings ArticleDOI
27 Feb 2021
TL;DR: In this paper, a combination of web crawling and type inference is used to find good training sets for the C programming language, which can be used to train compilers for code size reduction.
Abstract: A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.

29 citations

Proceedings ArticleDOI
19 Oct 2020
TL;DR: A infrastructure that provides developers with the means to explore good optimization sequences for LLVM, using code size as the objective function is described and it is shown that YaCoS is able to find sequences that improve onto clang -Oz by 3.75% on average.
Abstract: The growing popularity of machine learning frameworks and algorithms has greatly contributed to the design and exploration of good code optimization sequences. Yet, in spite of this progress, mainstream compilers still provide users with only a handful of fixed optimization sequences. Finding optimization sequences that are good in general is challenging because the universe of possible sequences is potentially infinite. This paper describes a infrastructure that provides developers with the means to explore this space. Said infrastructure, henceforth called YaCoS, consists of benchmarks, search algorithms, metrics to estimate the distance between programs, and compilation strategies. YaCoS's features let users build learning models that predict, for unknown programs, optimization sequences that are likely to yield good results for them. In this paper, as a case study, we have used YaCoS to find good optimization sequences for LLVM, using code size as the objective function. Such study lets us evaluate three feature sets: two variations of the feature vectors proposed by Namolaru at al in 2010, plus the optimization statistics produced by LLVM. Our results show that YaCoS is able to find sequences that improve onto clang -Oz by 3.75% on average. Our experiments do not indicate a dominant feature set out of the three approaches that we have investigated---it is possible to find programs in which one of them is strictly better than the others.

10 citations

Journal ArticleDOI
13 Nov 2020
TL;DR: This paper proposes an approach to further the specialization of dynamic language compilers, by disentangling classes of behaviors into separate optimization units, and describes a compiler for the R language which uses this approach.
Abstract: In order to generate efficient code, dynamic language compilers often need information, such as dynamic types, not readily available in the program source. Leveraging a mixture of static and dynamic information, these compilers speculate on the missing information. Within one compilation unit, they specialize the generated code to the previously observed behaviors, betting that past is prologue. When speculation fails, the execution must jump back to unoptimized code. In this paper, we propose an approach to further the specialization, by disentangling classes of behaviors into separate optimization units. With contextual dispatch, functions are versioned and each version is compiled under different assumptions. When a function is invoked, the implementation dispatches to a version optimized under assumptions matching the dynamic context of the call. As a proof-of-concept, we describe a compiler for the R language which uses this approach. Our implementation is, on average, 1.7× faster than the GNU R reference implementation. We evaluate contextual dispatch on a set of benchmarks and measure additional speedup, on top of traditional speculation with deoptimization techniques. In this setting contextual dispatch improves the performance of 18 out of 46 programs in our benchmark suite.

8 citations

Proceedings ArticleDOI
27 Sep 2021
TL;DR: In this paper, the benefits of inlining for code size reduction are evaluated on MiBench, showing that inlining enables context-sensitive optimizations that reduce code, while preserving the performance gains of LLVM's standard inlining decisions.
Abstract: Function inlining is a compiler optimization that replaces the call of a function with its body. Inlining is typically seen as an optimization that improves performance at the expenses of increasing code size. This paper goes against this intuition, and shows that inlining can be employed, in specific situations, as a way to reduce code size. Towards this end, we bring forward two results. First, we gauge the benefits of a trivial heuristic for code-size reduction: the inlining of functions that are invoked at only one call site in the program, followed by the elimination of the original callee. Second, we present and evaluate an analysis that identifies call sites where inlining enables context-sensitive optimizations that reduce code. We have implemented all these techniques in the LLVM compilation infrastructure. When applied onto MiBench, our inlining heuristics yield an average code size reduction of 2.96%, reaching 11% in the best case, over clang -Os. Moreover, our techniques preserve the performance gains of LLVM’s standard inlining decisions on MiBench: there is no statistically significant difference in the running time of code produced by these different approaches.

7 citations

Proceedings ArticleDOI
01 Nov 2018
TL;DR: The design and implementation of a suit of static analyses and code generation techniques to annotate programs with OpenMP pragmas for task parallelism are described, and it is shown that this suit can annotate large and convoluted programs, often replicating the performance gains of handmade annotation.
Abstract: This paper describes the design and implementation of a suit of static analyses and code generation techniques to annotate programs with OpenMP pragmas for task parallelism. These techniques approximate the ranges covered by memory regions, bound recursive tasks and estimate the profitability of tasks. We have used these ideas to implement a source-to-source compiler that inserts OpenMP pragmas into C/C++ programs without any human intervention. By building onto the static program analysis literature, and relying on OpenMP's runtime ability to disambiguate pointers, we show that we can annotate large and convoluted programs, often replicating the performance gains of handmade annotation. Furthermore, our techniques give us the means to discover opportunities of parallelism that remained buried in the syntax of well-known benchmarks for many years -sometimes leading to up to four-fold speedups on a 12-core machine at zero programming cost.

6 citations