scispace - formally typeset
Open AccessJournal ArticleDOI

Multi-modal program inference: a marriage of pre-trained language models and component-based synthesis

Reads0
Chats0
TLDR
In this article, a multi-modal program synthesis approach that combines machine-learned pre-trained models (PTMs) with component-based synthesis (CBS) is presented.
Abstract
Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms, such as a combination of natural language and examples. Examples provide a precise but incomplete specification, and natural language provides an ambiguous but more "complete" task description. Machine-learned pre-trained models (PTMs) are adept at handling ambiguous natural language, but struggle with generating syntactically and semantically precise code. Program synthesis techniques can generate correct code, often even from incomplete but precise specifications, such as examples, but they are unable to work with the ambiguity of natural languages. We present an approach that combines PTMs with component-based synthesis (CBS): PTMs are used to generate candidates programs from the natural language description of the task, which are then used to guide the CBS procedure to find the program that matches the precise examples-based specification. We use our combination approach to instantiate multi-modal synthesis systems for two programming domains: the domain of regular expressions and the domain of CSS selectors. Our evaluation demonstrates the effectiveness of our domain-agnostic approach in comparison to a state-of-the-art specialized system, and the generality of our approach in providing multi-modal program synthesis from natural language and examples in different programming domains.

read more

Citations
More filters
Proceedings ArticleDOI

Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models

TL;DR: A natural language code synthesis tool, GenLine, backed by a large generative language model and a set of task-specific prompts that create or change code is presented, indicating that while naturallanguage code synthesis can sometimes provide a magical experience, participants still faced challenges.
Journal ArticleDOI

Interactive Code Generation via Test-Driven User-Intent Formalization

TL;DR: This paper proposes the workflow of test-driven user-intent formalization (TDUIF), which leverages lightweight user feedback to jointly formalize the user intent as tests (a partial specification), and generates code that meets the formal user intent.
Journal ArticleDOI

Using Transfer Learning for Code-Related Tasks

TL;DR: The T5 model is assessed in supporting four different code-related tasks: (i) automatic bug-fixing, (ii) injection of code mutants, (iii) generation of assert statements, and (iv) code summarization.
Journal ArticleDOI

Improving automatically generated code from Codex via Automated Program Repair

TL;DR: This study systematically study whether automated program repair (APR) techniques can fix the incorrect solutions produced by language models in LeetCode contests, revealing that automatically generated codes share some common programming mistakes with human-crafted solutions, indicating existing APR tools have the potential to fix auto-generated code.
Proceedings ArticleDOI

Automated Repair of Programs from Large Language Models

TL;DR: Zhang et al. as mentioned in this paper systematically studied whether automated program repair (APR) techniques can fix the incorrect solutions produced by language models in LeetCode contests, and found that APR techniques may have potential to fix auto-generated code.
References
More filters
Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI

Mining association rules between sets of items in large databases

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Journal ArticleDOI

A statistical interpretation of term specificity and its application in retrieval

TL;DR: It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms.
Journal ArticleDOI

Learning regular sets from queries and counterexamples

TL;DR: In this article, the problem of identifying an unknown regular set from examples of its members and nonmembers is addressed, where the regular set is presented by a minimaMy adequate teacher, which can answer membership queries about the set and can also test a conjecture and indicate whether it is equal to the unknown set and provide a counterexample if not.
Related Papers (5)