scispace - formally typeset
Search or ask a question
Proceedings Article

Learning Program Embeddings to Propagate Feedback on Student Code

06 Jul 2015-pp 1093-1102
TL;DR: A neural network method is introduced to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and an algorithm for feedback at scale is proposed using these linear maps as features.
Abstract: Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students. We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features. We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University's CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article
01 Jun 2017
TL;DR: This work feeds the embedded program submission sequence into a recurrent neural network and train it on two tasks of predicting the student’s future performance, and learns nuanced representations of a student's knowledge, exposes patterns about a student�'s learning behavior, and reliably predicts future student performance.
Abstract: Modeling student knowledge while students are acquiring new concepts is a crucial stepping stone towards providing personalized automated feedback at scale. We believe that rich information about a student’s learning is captured within her responses to open-ended problems with unbounded solution spaces, such as programming exercises. In addition, sequential snapshots of a student’s progress while she is solving a single exercise can provide valuable insights into her learning behavior. Creating representations for a student’s knowledge state is a challenging task, but with recent advances in machine learning, there are more promising techniques to learn representations for complex entities. In our work, we feed the embedded program submission sequence into a recurrent neural network and train it on two tasks of predicting the student’s future performance. By training on these tasks, the model learns nuanced representations of a student’s knowledge, exposes patterns about a student’s learning behavior, and reliably predicts future student performance. Even more importantly, the model differentiates within a pool of poorly performing students and picks out students who have true knowledge gaps, giving teachers early warnings to provide assistance.

75 citations


Cites methods from "Learning Program Embeddings to Prop..."

  • ...These ASTs are converted into program embeddings using a recursive neural network similar to the one described in [15]....

    [...]

  • ...proposed to use recursive neural networks to create program embeddings for student code[15]....

    [...]

Proceedings ArticleDOI
27 May 2018
TL;DR: A novel Neuro-symbolic approach that combines neural networks with constraint-based reasoning is proposed that is able to repair syntax errors in 60% of submissions, and finds functionally correct repairs for 23.8% of submitted programs.
Abstract: Automatic correction of programs is a challenging problem with numerous real world applications in security, verification, and education. One application that is becoming increasingly important is the correction of student submissions in online courses for providing feedback. Most existing program repair techniques analyze Abstract Syntax Trees (ASTs) of programs, which are unfortunately unavailable for programs with syntax errors. In this paper, we propose a novel Neuro-symbolic approach that combines neural networks with constraint-based reasoning. Specifically, our method first uses a Recurrent Neural Network (RNN) to perform syntax repairs for the buggy programs; subsequently, the resulting syntactically-fixed programs are repaired using constraint-based techniques to ensure functional correctness. The RNNs are trained using a corpus of syntactically correct submissions for a given programming assignment, and are then queried to fix syntax errors in an incorrect programming submission by replacing or inserting the predicted tokens at the error location. We evaluate our technique on a dataset comprising of over 14,500 student submissions with syntax errors. Our method is able to repair syntax errors in 60% (8689) of submissions, and finds functionally correct repairs for 23.8% (3455) submissions.

74 citations

Journal ArticleDOI
TL;DR: This paper presents a new code summarization approach using hierarchical attention network by incorporating multiple code features, including type-augmented abstract syntax trees and program control flows into a deep reinforcement learning (DRL) framework for comment generation.
Abstract: Code summarization (aka comment generation) provides a high-level natural language description of the function performed by code, which can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, the state-of-the-art approaches follow an encoder-decoder framework which encodes source code into a hidden space and later decodes it into a natural language space. Such approaches suffer from the following drawbacks: (a) they are mainly input by representing code as a sequence of tokens while ignoring code hierarchy; (b) most of the encoders only input simple features (e.g., tokens) while ignoring the features that can help capture the correlations between comments and code; (c) the decoders are typically trained to predict subsequent words by maximizing the likelihood of subsequent ground truth words, while in real world, they are excepted to generate the entire word sequence from scratch. As a result, such drawbacks lead to inferior and inconsistent comment generation accuracy. To address the above limitations, this paper presents a new code summarization approach using hierarchical attention network by incorporating multiple code features, including type-augmented abstract syntax trees and program control flows. Such features, along with plain code sequences, are injected into a deep reinforcement learning (DRL) framework (e.g., actor-critic network) for comment generation. Our approach assigns weights (pays “attention”) to tokens and statements when constructing the code representation to reflect the hierarchical code structure under different contexts regarding code features (e.g., control flows and abstract syntax trees). Our reinforcement learning mechanism further strengthens the prediction results through the actor network and the critic network, where the actor network provides the confidence of predicting subsequent words based on the current state, and the critic network computes the reward values of all the possible extensions of the current state to provide global guidance for explorations. Eventually, we employ an advantage reward to train both networks and conduct a set of experiments on a real-world dataset. The experimental results demonstrate that our approach outperforms the baselines by around 22% to 45% in BLEU-1 and outperforms the state-of-the-art approaches by around 5% to 60% in terms of S-BLEU and C-BLEU.

73 citations

Posted Content
Lili Mou, Ge Li, Zhi Jin, Lu Zhang, Tao Wang 
TL;DR: To the best knowledge, this paper is the first to analyze programs with deep neural networks and extends the scope of deep learning to the field of programming language processing; the experimental results validate its feasibility and show a promising future of this new research area.
Abstract: Deep neural networks have made significant breakthroughs in many fields of artificial intelligence However, it has not been applied in the field of programming language processing In this paper, we propose the tree-based convolutional neural network (TBCNN) to model programming languages, which contain rich and explicit tree structural information In our model, program vector representations are learned by the "coding" pretraining criterion based on abstract syntax trees (ASTs); the convolutional layer explicitly captures neighboring features on the tree; with the "binary continuous tree" and "3-way pooling," our model can deal with ASTs of different shapes and sizesWe evaluate the program vector representations empirically, showing that the coding criterion successfully captures underlying features of AST nodes, and that program vector representations significantly speed up supervised learning We also compare TBCNN to baseline methods; our model achieves better accuracy in the task of program classification To our best knowledge, this paper is the first to analyze programs with deep neural networks; we extend the scope of deep learning to the field of programming language processing The experimental results validate its feasibility; they also show a promising future of this new research area

69 citations

Posted Content
TL;DR: An exhaustive evaluation on the task of checking equivalence on a highly diverse class of symbolic algebraic and boolean expression types is performed, showing that the proposed neural equivalence networks model significantly outperforms existing architectures.
Abstract: Combining abstract, symbolic reasoning with continuous neural reasoning is a grand challenge of representation learning. As a step in this direction, we propose a new architecture, called neural equivalence networks, for the problem of learning continuous semantic representations of algebraic and logical expressions. These networks are trained to represent semantic equivalence, even of expressions that are syntactically very different. The challenge is that semantic representations must be computed in a syntax-directed manner, because semantics is compositional, but at the same time, small changes in syntax can lead to very large changes in semantics, which can be difficult for continuous neural architectures. We perform an exhaustive evaluation on the task of checking equivalence on a highly diverse class of symbolic algebraic and boolean expression types, showing that our model significantly outperforms existing architectures.

69 citations


Cites background from "Learning Program Embeddings to Prop..."

  • ...Piech et al. (2015) also learn distributed matrix representations of student code submissions....

    [...]

References
More filters
Proceedings Article
01 Jan 2010
TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

7,244 citations

Journal Article
TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

6,984 citations


Additional excerpts

  • ...Learning rates are set using Adagrad (Duchi et al., 2011)....

    [...]

Journal Article
TL;DR: This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.
Abstract: Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

6,935 citations


"Learning Program Embeddings to Prop..." refers methods in this paper

  • ...We use random search (Bergstra & Bengio, 2012) to optimize over hyperparameters (e.g, regularization parameters, matrix dimensions, and minibatch size)....

    [...]

Proceedings Article
01 Oct 2013
TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Abstract: Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.

6,792 citations


"Learning Program Embeddings to Prop..." refers background or methods in this paper

  • ...The programs for these assignments operate in maze worlds where an agent can move, turn, and test for conditions of its current location....

    [...]

  • ...Our models are related to recent work from the NLP and deep learning communities on recursive neural networks, particularly for modeling semantics in sentences or symbolic expressions (Socher et al., 2013; 2011; Zaremba et al., 2014; Bowman, 2013)....

    [...]

  • ...…on recursive neural networks (called the NPM-RNN model) in which we parametrize a matrix MA in this new model with an RNN whose architecture follows the abstract syntax tree (similar to the way in which RNN architectures might take the form of a parse tree in an NLP setting (Socher et al., 2013))....

    [...]

Book
04 Oct 1993
TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.
Abstract: This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual FORTRAN programs are then presented to illustrate the correlation between intuitive complexity and the graph theoretic complexity. Several properties of the graph-theoretic complexity are then proved which show, for example, that complexity is independent of physical size (adding or subtracting functional statements leaves complexity unchanged) and complexity depends only on the decision structure of a program.The issue of using non-structured control flow is also discussed. A characterization of non-structured control graphs is given and a method of measuring the “structuredness” of a program is developed. The relationship between structure and reducibility is illustrated with several examples.The last section of the paper deals with a testing methodology used in conjunction with the complexity measure; a testing strategy is defined that dictates that a program can either admit of a certain minimal testing level or the program can be structurally reduced.

5,171 citations