A Survey of Machine Learning for Big Code and Naturalness
Citations
1,429 citations
1,097 citations
Cites methods from "A Survey of Machine Learning for Bi..."
...n the field of machine learning for code not only for its straightforward adoption in developer tools, but also because it is a proxy measure for assessing how well a model captures the code semantic (Allamanis et al., 2018). Following Alon et al. (2019, 2018), we use an F1 score to evaluate predicted sub-tokens against ground-truth sub-tokens.24 The average length of a method name in the ground-truth is 2:6 sub-tokens, ...
[...]
849 citations
Cites background from "A Survey of Machine Learning for Bi..."
...This contrasts with symbolic representations, where each element is uniquely represented with exactly one component [Allamanis et al. 2017]....
[...]
...…et al. 2015], code completion [Mishne et al. 2012; Raychev et al. 2014], code summarization [Allamanis et al. 2016], code generation [Amodio et al. 2017; Lu et al. 2017; Maddison and Tarlow 2014; Murali et al. 2017], and more (see [Allamanis et al. 2017; Vechev and Yahav 2016] for a survey)....
[...]
541 citations
Cites background from "A Survey of Machine Learning for Bi..."
...(1) Source code is structured: In contrast to natural language text which is weakly structured, programming languages are formal languages and source code written in them are unambiguous and structured [3]....
[...]
492 citations
Cites background from "A Survey of Machine Learning for Bi..."
...Applying machine learning to code has been widely considered [2]....
[...]
References
72,897 citations
"A Survey of Machine Learning for Bi..." refers background in this paper
...Publication date: July 2018. neural architectures, such as LSTMs [91], GRUs [41], and their variants, have made progress on this problem and can handle moderately long-range dependencies....
[...]
...Koc et al. [106] train a classifier, using LSTMs, to predict if a static analysis warning is a false positive....
[...]
...Following this trend, Karpathy et al. [103] and Cummins et al. [48] use character-level LSTMs [91]....
[...]
...For example, the work of Maddison and Tarlow [126] and other neural language models (e.g., LSTMs in Dam et al. [49]) describe context distributed representations while sequentially generating code....
[...]
...neural architectures, such as LSTMs [91], GRUs [41], and their variants, have made progress on this problem and can handle moderately long-range dependencies....
[...]
[...]
38,208 citations
21,126 citations
20,196 citations
"A Survey of Machine Learning for Bi..." refers background in this paper
...g the output, since the quality of the output is rarely quantifiable. A vast literature on non-probabilistic methods exploits data mining methods, such as frequent pattern mining and anomaly detection [190]. We do not discuss these models here, since they are not probabilistic models of code. Classic probabilistic topic models[30], whichusually views code(orothersoftware engineeringartifacts)asabag-of-w...
[...]
20,077 citations