scispace - formally typeset
Open AccessJournal ArticleDOI

A Survey of Machine Learning for Big Code and Naturalness

Reads0
Chats0
TLDR
A survey of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code.
Abstract
Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.

read more

Citations
More filters
Posted Content

Open Graph Benchmark: Datasets for Machine Learning on Graphs

TL;DR: The OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs, indicating fruitful opportunities for future research.
Posted Content

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search.

TL;DR: The methodology used to obtain the corpus and expert labels, as well as a number of simple baseline solutions for the task are described.
Proceedings ArticleDOI

A novel neural source code representation based on abstract syntax tree

TL;DR: This paper proposes a novel AST-based Neural Network (ASTNN) for source code representation that splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements.
Journal ArticleDOI

SequenceR : Sequence-to-Sequence Learning for End-to-End Program Repair

TL;DR: This paper devise, implement, and evaluate a technique, called SEQUENCER, for fixing bugs based on sequence-to-sequence learning on source code, which captures a wide range of repair operators without any domain-specific top-down design.
Proceedings ArticleDOI

A neural model for generating natural language summaries of program subroutines

TL;DR: In this article, a neural model that combines words from code with code structure from an AST is presented, which allows the model to learn code structure independent of the text in code.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content

Sequence to Sequence Learning with Neural Networks

TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Journal ArticleDOI

Anomaly detection: A survey

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
Related Papers (5)