Retrieval on source code: a neural code search

doi:10.1145/3211346.3211353

Proceedings ArticleDOI

Retrieval on source code: a neural code search

Saksham Sachdev, +5 more

- pp 31-41

Chats0

TLDR

This paper investigates the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand.

Abstract:

Searching over large code corpora can be a powerful productivity tool for both beginner and experienced developers because it helps them quickly find examples of code related to their intent. Code search becomes even more attractive if developers could express their intent in natural language, similar to the interaction that Stack Overflow supports. In this paper, we investigate the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand. Our experiments using a benchmark suite derived from Stack Overflow and GitHub repositories show promising results. We find that while a basic word–embedding based search procedure works acceptably, better results can be obtained by adding a layer of supervision, as well as by a customized ranking strategy.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

When deep learning met code search

José Pablo Cambronero, +4 more

TL;DR: In this paper, the authors evaluate the performance of supervised techniques for code search using natural language and show that adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much.

...read moreread less

Proceedings ArticleDOI

NL2Type: inferring JavaScript function types from natural language information

Rabee Sohail Malik, +2 more

TL;DR: NL2Type is presented, a learning-based approach for predicting likely type signatures of JavaScript functions using a recurrent, LSTM-based neural model that, after learning from an annotated code base, predicts function types for unannotated code.

...read moreread less

Journal ArticleDOI

Aroma: Code Recommendation via Structural Code Search

Sifei Luan, +4 more

- 04 Dec 2018 -

arXiv: Software Engineering

TL;DR: Aroma as mentioned in this paper is a tool and technique for code recommendation via structural code search, which takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippets, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus.

...read moreread less

Journal ArticleDOI

Aroma: code recommendation via structural code search

Sifei Luan, +4 more

TL;DR: Aroma as mentioned in this paper is a tool and technique for code recommendation via structural code search, which takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippets, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus.

...read moreread less

Posted Content

Adversarial Examples for Models of Code

Noam Yefet, +2 more

- 15 Oct 2019 -

arXiv: Learning

TL;DR: The main idea of the approach is to force a given trained model to make an incorrect prediction, as specified by the adversary, by introducing small perturbations that do not change the program’s semantics, thereby creating an adversarial example.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Journal ArticleDOI

Enriching Word Vectors with Subword Information

Piotr Bojanowski, +3 more

- 12 Jun 2017 -

Transactions of the Association for Comp...

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.

...read moreread less

Journal ArticleDOI

Distributional Structure

Zellig S. Harris

- 01 Jan 1954 -

WORD

TL;DR: This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.

...read moreread less

Posted Content

Enriching Word Vectors with Subword Information

Piotr Bojanowski, +3 more

- 15 Jul 2016 -

arXiv: Computation and Language

TL;DR: A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

...read moreread less

Posted Content

Billion-scale similarity search with GPUs

Jeff Johnson, +2 more

- 28 Feb 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, the authors propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art.

...read moreread less

arXiv: Learning

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

Retrieval on source code: a neural code search

Citations

When deep learning met code search

NL2Type: inferring JavaScript function types from natural language information

Aroma: Code Recommendation via Structural Code Search

Aroma: code recommendation via structural code search

Adversarial Examples for Models of Code

References

Distributed Representations of Words and Phrases and their Compositionality

Enriching Word Vectors with Subword Information

Distributional Structure

Enriching Word Vectors with Subword Information

Billion-scale similarity search with GPUs

Related Papers (5)

Deep code search

Summarizing Source Code using a Neural Attention Model

code2vec: learning distributed representations of code

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search.

Distributed Representations of Words and Phrases and their Compositionality