Multi-modal program inference: a marriage of pre-trained language models and component-based synthesis

In this article, a multi-modal program synthesis approach that combines machine-learned pre-trained models (PTMs) with component-based synthesis (CBS) is presented.

Abstract:

Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms, such as a combination of natural language and examples. Examples provide a precise but incomplete specification, and natural language provides an ambiguous but more "complete" task description. Machine-learned pre-trained models (PTMs) are adept at handling ambiguous natural language, but struggle with generating syntactically and semantically precise code. Program synthesis techniques can generate correct code, often even from incomplete but precise specifications, such as examples, but they are unable to work with the ambiguity of natural languages. We present an approach that combines PTMs with component-based synthesis (CBS): PTMs are used to generate candidates programs from the natural language description of the task, which are then used to guide the CBS procedure to find the program that matches the precise examples-based specification. We use our combination approach to instantiate multi-modal synthesis systems for two programming domains: the domain of regular expressions and the domain of CSS selectors. Our evaluation demonstrates the effectiveness of our domain-agnostic approach in comparison to a state-of-the-art specialized system, and the generality of our approach in providing multi-modal program synthesis from natural language and examples in different programming domains.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models

Ellen Jiang,Edwin Toh,Alejandra Molina,Kristen Olson,Claire Kayacik,Aaron Donsbach,Carrie J. Cai,Michael Terry +7 more

Show Less

TL;DR: A natural language code synthesis tool, GenLine, backed by a large generative language model and a set of task-specific prompts that create or change code is presented, indicating that while naturallanguage code synthesis can sometimes provide a magical experience, participants still faced challenges.

...read moreread less

Journal ArticleDOI

Interactive Code Generation via Test-Driven User-Intent Formalization

Shuvendu K. Lahiri,Aaditya Naik,Georgios K. Sakkas,Piali Choudhury,Curtis von Veh,Madanlal Musuvathi,Jeevana Priya Inala,Chenglong Wang,Jianfeng Gao +8 more

- 11 Aug 2022 -

arXiv.org

Show Less

TL;DR: This paper proposes the workflow of test-driven user-intent formalization (TDUIF), which leverages lightweight user feedback to jointly formalize the user intent as tests (a partial specification), and generates code that meets the formal user intent.

...read moreread less

Journal ArticleDOI

Using Transfer Learning for Code-Related Tasks

Antonio Mastropaolo,Nathan Cooper,David N. Palacio,Simone Scalabrino,Denys Poshyvanyk,Rocco Oliveto,Gabriele Bavota +6 more

- 17 Jun 2022 -

IEEE Transactions on Software Engineerin...

Show Less

TL;DR: The T5 model is assessed in supporting four different code-related tasks: (i) automatic bug-ﬁxing, (ii) injection of code mutants, (iii) generation of assert statements, and (iv) code summarization.

...read moreread less

Journal ArticleDOI

Improving automatically generated code from Codex via Automated Program Repair

Zhiyu Fan,Xiang Gao,Abhik Roychoudhury,Shin Hwei Tan +3 more

arXiv.org

Show Less

TL;DR: This study systematically study whether automated program repair (APR) techniques can fix the incorrect solutions produced by language models in LeetCode contests, revealing that automatically generated codes share some common programming mistakes with human-crafted solutions, indicating existing APR tools have the potential to fix auto-generated code.

...read moreread less

Proceedings ArticleDOI

Automated Repair of Programs from Large Language Models

Zhiyu Fan,Xiang Gao,Martin Mirchev,Abhik Roychoudhury,Shin Hwei Tan +4 more

Show Less

TL;DR: Zhang et al. as mentioned in this paper systematically studied whether automated program repair (APR) techniques can fix the incorrect solutions produced by language models in LeetCode contests, and found that APR techniques may have potential to fix auto-generated code.

...read moreread less

1
2
3
4
…
5

References

PDF

Open Access

More filters

Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova +3 moreGoogle

- 11 Oct 2018 -

arXiv: Computation and Language

Show Less

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings ArticleDOI

Mining association rules between sets of items in large databases

Rakesh Agrawal,Tomasz Imielinski,Arun N. Swami +2 moreIBM,Rutgers University

Show Less

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.

...read moreread less

Journal ArticleDOI

A statistical interpretation of term specificity and its application in retrieval

Karen Sparck JonesUniversity of Cambridge

- 01 Jan 1972 -

Journal of Documentation

Show Less

TL;DR: It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms.

...read moreread less

Journal ArticleDOI

Learning regular sets from queries and counterexamples

Dana AngluinYale University

- 01 Nov 1987 -

Information & Computation

Show Less

TL;DR: In this article, the problem of identifying an unknown regular set from examples of its members and nonmembers is addressed, where the regular set is presented by a minimaMy adequate teacher, which can answer membership queries about the set and can also test a conjecture and indicate whether it is equal to the unknown set and provide a counterexample if not.

...read moreread less

Posted Content

Language Models are Few-Shot Learners

Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 moreOpenAI,University of California, Berkeley,Johns Hopkins University,Google,Massachusetts Institute of Technology

- 28 May 2020 -

arXiv: Computation and Language

Show Less

TL;DR: This article showed that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.

...read moreread less