scispace - formally typeset
Search or ask a question
Proceedings Article

Learning Program Embeddings to Propagate Feedback on Student Code

06 Jul 2015-pp 1093-1102
TL;DR: A neural network method is introduced to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and an algorithm for feedback at scale is proposed using these linear maps as features.
Abstract: Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students. We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features. We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University's CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
10 Nov 2019
TL;DR: This experience paper identifies 3 potential downstream tasks, namely code comments generation, code authorship identification, and code clones detection, that source code token embedding models can be applied to and empirically assess a recently proposed code2vec's token embeddings.
Abstract: Many Natural Language Processing (NLP) tasks, such as sentiment analysis or syntactic parsing, have benefited from the development of word embedding models. In particular, regardless of the training algorithms, the learned embeddings have often been shown to be generalizable to different NLP tasks. In contrast, despite recent momentum on word embeddings for source code, the literature lacks evidence of their generalizability beyond the example task they have been trained for. In this experience paper, we identify 3 potential downstream tasks, namely code comments generation, code authorship identification, and code clones detection, that source code token embedding models can be applied to. We empirically assess a recently proposed code token embedding model, namely code2vec's token embeddings. Code2vec was trained on the task of predicting method names, and while there is potential for using the vectors it learns on other tasks, it has not been explored in literature. Therefore, we fill this gap by focusing on its generalizability for the tasks we have identified. Eventually, we show that source code token embeddings cannot be readily leveraged for the downstream tasks. Our experiments even show that our attempts to use them do not result in any improvements over less sophisticated methods. We call for more research into effective and general use of code embeddings.

67 citations

Proceedings ArticleDOI
10 Nov 2019
TL;DR: Zhang et al. as discussed by the authors proposed a multi-modal attention fusion layer to assign weights to different parts of source code and then integrate them into a single hybrid representation for semantic source code retrieval.
Abstract: Code retrieval techniques and tools have been playing a key role in facilitating software developers to retrieve existing code fragments from available open-source repositories given a user query (e.g., a short natural language text describing the functionality for retrieving a particular code snippet). Despite the existing efforts in improving the effectiveness of code retrieval, there are still two main issues hindering them from being used to accurately retrieve satisfiable code fragments from large-scale repositories when answering complicated queries. First, the existing approaches only consider shallow features of source code such as method names and code tokens, but ignoring structured features such as abstract syntax trees (ASTs) and control-flow graphs (CFGs) of source code, which contains rich and well-defined semantics of source code. Second, although the deep learning-based approach performs well on the representation of source code, it lacks the explainability, making it hard to interpret the retrieval results and almost impossible to understand which features of source code contribute more to the final results. To tackle the two aforementioned issues, this paper proposes MMAN, a novel Multi-Modal Attention Network for semantic source code retrieval. A comprehensive multi-modal representation is developed for representing unstructured and structured features of source code, with one LSTM for the sequential tokens of code, a Tree-LSTM for the AST of code and a GGNN (Gated Graph Neural Network) for the CFG of code. Furthermore, a multi-modal attention fusion layer is applied to assign weights to different parts of each modality of source code and then integrate them into a single hybrid representation. Comprehensive experiments and analysis on a large-scale real-world dataset show that our proposed model can accurately retrieve code snippets and outperforms the state-of-the-art methods.

67 citations

Proceedings ArticleDOI
26 Oct 2018
TL;DR: This paper uses abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings and shows thatembeddings learned from semantic abstractions provide nearly triple the accuracy of those learned from syntactic abstractions.
Abstract: With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions.

66 citations


Cites background from "Learning Program Embeddings to Prop..."

  • ...ecent techniques that approach the problem of embedding programs (or, more generally, symbolic-expressions/trees) in unique ways. Using input/output pairs as the input data for learning, Piech et al. [42] and Parisotto et al. [39] learn to embed whole programs. Using sequences of live variable values, Wang et al. [49] produce embeddings to aid in program repair tasks. Allamanis et al. [4] learn to emb...

    [...]

Journal ArticleDOI
TL;DR: This research presents a framework that formalizes a process where a hypothesis-driven approach informed by Evidence-Centered Design effectively complements data-driven learning analytics in interpreting students’ programming process and assessing CT in block-based programming environments.
Abstract: Systematic endeavors to take computer science (CS) and computational thinking (CT) to scale in middle and high school classrooms are underway with curricula that emphasize the enactment of authentic CT skills, especially in the context of programming in block-based programming environments. There is, therefore, a growing need to measure students’ learning of CT in the context of programming and also support all learners through this process of learning computational problem solving. The goal of this research is to explore hypothesis-driven approaches that can be combined with data-driven ones to better interpret student actions and processes in log data captured from block-based programming environments with the goal of measuring and assessing students’ CT skills. Informed by past literature and based on our empirical work examining a dataset from the use of the Fairy Assessment in the Alice programming environment in middle schools, we present a framework that formalizes a process where a hypothesis-driven approach informed by Evidence-Centered Design effectively complements data-driven learning analytics in interpreting students’ programming process and assessing CT in block-based programming environments. We apply the framework to the design of Alice tasks for high school CS to be used for measuring CT during programming.

59 citations


Cites background from "Learning Program Embeddings to Prop..."

  • ...[49] also studied data collected from hundreds of thousands of users working on Code....

    [...]

Proceedings ArticleDOI
19 Apr 2017
TL;DR: A semantic-aware technique to provide personalized feedback that aims to mimic an instructor looking for code snippets in student submissions, modeled as subgraph patterns with natural language feedback attached to them is presented.
Abstract: Currently, there is a "boom" in introductory programming courses to help students develop their computational thinking skills Providing timely, personalized feedback that makes students reflect about what and why they did correctly or incorrectly is critical in such courses However, the limited number of instructors and the great volume of submissions instructors need to assess, especially in Massive Open Online Courses (MOOCs), prove this task a challenge One solution is to hire graders or create peer discussions among students, however, feedback may be too general, incomplete or even incorrect Automatic techniques focus on: a) Functional testing, in which feedback usually does not sufficiently guide novices, b) Software verification to find code bugs, which may confuse novices since these tools usually skip true errors or produce false errors, and c) Comparing using reference solutions, in which a large amount of reference solutions or pre-existing correct submissions are usually required This paper presents a semantic-aware technique to provide personalized feedback that aims to mimic an instructor looking for code snippets in student submissions These snippets are modeled as subgraph patterns with natural language feedback attached to them Submissions are transformed into extended program dependence graphs combining control and data flows We leverage subgraph matching techniques to compute the adequate personalized feedback Also, constraints correlating patterns allow performing fine-grained assessments We have evaluated our method on several introductory programming assignments and a large number of submissions Our technique delivered personalized feedback in milliseconds using a small set of patterns, which makes it appealing in real-world settings

48 citations


Cites background or methods from "Learning Program Embeddings to Prop..."

  • ...Other approaches rely on one or more reference solutions to a given assignment that are compared over student submissions [14], [15], [27], [28], [33], [36], [40]....

    [...]

  • ...[28] exploit machine learning to provide personalized feedback that requires the existence of a large amount of previous student submissions, which is not always the case [27]....

    [...]

  • ...Another approach to providing personalized feedback consists of comparing student submissions with one or more reference solutions [14], [15], [27], [28], [33], [36], [40]....

    [...]

  • ...[27] require the number of variables to be fixed beforehand, which may be not beneficial for novice students developing their problem-solving skills [12]....

    [...]

  • ...Furthermore, some of them also have scalability problems that prevent them from being used in real-world settings [15], or require a large amount of pre-existing correct submissions [15], [27], [28]....

    [...]

References
More filters
Proceedings Article
01 Jan 2010
TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

7,244 citations

Journal Article
TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

6,984 citations


Additional excerpts

  • ...Learning rates are set using Adagrad (Duchi et al., 2011)....

    [...]

Journal Article
TL;DR: This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.
Abstract: Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

6,935 citations


"Learning Program Embeddings to Prop..." refers methods in this paper

  • ...We use random search (Bergstra & Bengio, 2012) to optimize over hyperparameters (e.g, regularization parameters, matrix dimensions, and minibatch size)....

    [...]

Proceedings Article
01 Oct 2013
TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Abstract: Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.

6,792 citations


"Learning Program Embeddings to Prop..." refers background or methods in this paper

  • ...The programs for these assignments operate in maze worlds where an agent can move, turn, and test for conditions of its current location....

    [...]

  • ...Our models are related to recent work from the NLP and deep learning communities on recursive neural networks, particularly for modeling semantics in sentences or symbolic expressions (Socher et al., 2013; 2011; Zaremba et al., 2014; Bowman, 2013)....

    [...]

  • ...…on recursive neural networks (called the NPM-RNN model) in which we parametrize a matrix MA in this new model with an RNN whose architecture follows the abstract syntax tree (similar to the way in which RNN architectures might take the form of a parse tree in an NLP setting (Socher et al., 2013))....

    [...]

Book
04 Oct 1993
TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.
Abstract: This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual FORTRAN programs are then presented to illustrate the correlation between intuitive complexity and the graph theoretic complexity. Several properties of the graph-theoretic complexity are then proved which show, for example, that complexity is independent of physical size (adding or subtracting functional statements leaves complexity unchanged) and complexity depends only on the decision structure of a program.The issue of using non-structured control flow is also discussed. A characterization of non-structured control graphs is given and a method of measuring the “structuredness” of a program is developed. The relationship between structure and reducibility is illustrated with several examples.The last section of the paper deals with a testing methodology used in conjunction with the complexity measure; a testing strategy is defined that dictates that a program can either admit of a certain minimal testing level or the program can be structurally reduced.

5,171 citations