Learning Program Embeddings to Propagate Feedback on Student Code

Home
/
Papers
/
Learning Program Embeddings to Propagate Feedback on Student Code

Proceedings Article•

Learning Program Embeddings to Propagate Feedback on Student Code

Chris Piech¹, Jonathan Huang², Andy Nguyen¹, Mike Phulsuksombati¹, Mehran Sahami¹, Leonidas J. Guibas¹ - Show less +2 more•Institutions (2)

Stanford University¹, Google²

06 Jul 2015-pp 1093-1102

TL;DR: A neural network method is introduced to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and an algorithm for feedback at scale is proposed using these linear maps as features.

read less

Abstract: Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students. We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features. We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University's CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

An Exploration of Automated Grading of Complex Assignments

[...]

Chase Geigle¹, ChengXiang Zhai¹, Duncan C. Ferguson¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

25 Apr 2016

TL;DR: This paper conducts the first systematic study of how to automate grading of a complex assignment using a medical case assessment as a test case and proposes a sequential pairwise online active learning strategy to minimize the effort of human grading and optimize the collaboration of human graders and an automated grader.

...read moreread less

Abstract: Automated grading is essential for scaling up learning. In this paper, we conduct the first systematic study of how to automate grading of a complex assignment using a medical case assessment as a test case. We propose to solve this problem using a supervised learning approach and introduce three general complementary types of feature representations of such complex assignments for use in supervised learning. We first show with empirical experiments that it is feasible to automate grading of such assignments provided that the instructor can grade a number of examples. We further study how to integrate an automated grader with human grading and propose to frame the problem as learning to rank assignments to exploit pairwise preference judgments and use NDPM as a measure for evaluation of the accuracy of ranking. We then propose a sequential pairwise online active learning strategy to minimize the effort of human grading and optimize the collaboration of human graders and an automated grader. Experiment results show that this strategy is indeed effective and can substantially reduce human effort as compared with randomly sampling assignments for manual grading.

...read moreread less

24 citations

Cites background from "Learning Program Embeddings to Prop..."

...Recent efforts have focused on providing feedback to students about their programs by leveraging structural similarities in the code itself to allow feedback to be provided to many assignments at once that share particular features [23, 25]....
[...]

Journal Article•DOI•

Adaptive structure metrics for automated feedback provision in intelligent tutoring systems

[...]

Benjamin Paaßen¹, Bassam Mokbel¹, Barbara Hammer¹•Institutions (1)

Bielefeld University¹

05 Jun 2016-Neurocomputing

TL;DR: This work presents a general-purpose framework to construct structure metrics on sequential data and to adapt those metrics using machine learning techniques, and demonstrates that metric adaptation improves the classification of wrong versus correct learner attempts in a simulated data set from sports training.

...read moreread less

24 citations

Cites background from "Learning Program Embeddings to Prop..."

...While first approaches exist to transform student data into a vectorial format, most data is still only available as structured data, such as sequences, trees or graphs [13]....
[...]

Proceedings Article•

Neural Attribution for Semantic Bug-Localization in Student Programs

[...]

Rahul Gupta¹, Aditya Kanade¹, Shirish Shevade¹•Institutions (1)

Indian Institute of Science¹

01 Jan 2019

TL;DR: This work presents NeuralBugLocator, a deep learning based technique, that can localize the bugs in a faulty program with respect to a failing test, without even running the program.

...read moreread less

Abstract: Providing feedback is an integral part of teaching. Most open online courses on programming make use of automated grading systems to support programming assignments and give real-time feedback. These systems usually rely on test results to quantify the programs' functional correctness. They return failing tests to the students as feedback. However, students may find it difficult to debug their programs if they receive no hints about where the bug is and how to fix it. In this work, we present NeuralBugLocator, a deep learning based technique, that can localize the bugs in a faulty program with respect to a failing test, without even running the program. At the heart of our technique is a novel tree convolutional neural network which is trained to predict whether a program passes or fails a given test. To localize the bugs, we analyze the trained network using a state-of-the-art neural prediction attribution technique and see which lines of the programs make it predict the test outcomes. Our experiments show that NeuralBugLocator is generally more accurate than two state-of-the-art program-spectrum based and one syntactic difference based bug-localization baselines.

...read moreread less

23 citations

Cites methods from "Learning Program Embeddings to Prop..."

...The clusters are created either using heuristics based on program analysis techniques [8, 15, 10, 27, 23] or using program execution on a set of inputs [19, 20]....
[...]
...The clusters are typically used in the following two ways: (1) the feedback is generated manually for a representative program in each cluster and then customized to other members of the cluster automatically [19, 20, 8], and (2) for a buggy program, a correct program is selected from the same cluster as a reference implementation, which is then compared to the buggy program to generate a repair hint [15, 10, 27, 23]....
[...]

Posted Content•

Neural Software Analysis.

[...]

Michael Pradel, Satish Chandra¹•Institutions (1)

Association for Computing Machinery¹

16 Nov 2020-arXiv: Software Engineering

TL;DR: Developer tools that use a neural machine learning model to make predictions about previously unseen code that help developers understand how code is written and improve its quality.

...read moreread less

Abstract: Many software development problems can be addressed by program analysis tools, which traditionally are based on precise, logical reasoning and heuristics to ensure that the tools are practical. Recent work has shown tremendous success through an alternative way of creating developer tools, which we call neural software analysis. The key idea is to train a neural machine learning model on numerous code examples, which, once trained, makes predictions about previously unseen code. In contrast to traditional program analysis, neural software analysis naturally handles fuzzy information, such as coding conventions and natural language embedded in code, without relying on manually encoded heuristics. This article gives an overview of neural software analysis, discusses when to (not) use it, and presents three example analyses. The analyses address challenging software development problems: bug detection, type prediction, and code completion. The resulting tools complement and outperform traditional program analyses, and are used in industrial practice.

...read moreread less

22 citations

Cites methods from "Learning Program Embeddings to Prop..."

...g data. One promising direction is to neurally analyze software based on runtime information. So far, almost all existing work focuses on static neural software analysis, with some notable exceptions [30, 37]. Better models. A core concern of every neural software analysis is how to represent software as vectors that enable a neural model to reason about the software. Learned representations of code are a...
[...]

Proceedings Article•DOI•

SemCluster: clustering of imperative programming assignments based on quantitative semantic features

[...]

David Perry¹, Dohyeong Kim¹, Roopsha Samanta¹, Xiangyu Zhang¹•Institutions (1)

Purdue University¹

08 Jun 2019

TL;DR: The comprehensive evaluation of the tool SemCluster on benchmarks drawn from solutions to small programming assignments shows that it generates far fewer clusters, precisely identifies distinct solution strategies, and boosts the performance of clustering-based program repair, all within a reasonable amount of time.

...read moreread less

Abstract: A fundamental challenge in automated reasoning about programming assignments at scale is clustering student submissions based on their underlying algorithms. State-of-the-art clustering techniques are sensitive to control structure variations, cannot cluster buggy solutions with similar correct solutions, and either require expensive pair-wise program analyses or training efforts. We propose a novel technique that can cluster small imperative programs based on their algorithmic essence: (A) how the input space is partitioned into equivalence classes and (B) how the problem is uniquely addressed within individual equivalence classes. We capture these algorithmic aspects as two quantitative semantic program features that are merged into a program's vector representation. Programs are then clustered using their vector representations. The computation of our first semantic feature leverages model counting to identify the number of inputs belonging to an input equivalence class. The computation of our second semantic feature abstracts the program's data flow by tracking the number of occurrences of a unique pair of consecutive values of a variable during its lifetime. The comprehensive evaluation of our tool SemCluster on benchmarks drawn from solutions to small programming assignments shows that SemCluster (1) generates far fewer clusters than other clustering techniques, (2) precisely identifies distinct solution strategies, and (3) boosts the performance of clustering-based program repair, all within a reasonable amount of time.

...read moreread less

22 citations

1
2
3
4
5
6
…
7
8
9
10
11
12
13
…
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, IBM², Google³

01 Jan 2010

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

...read moreread less

7,244 citations

Journal Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, Princeton University², Google³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

6,984 citations

Additional excerpts

...Learning rates are set using Adagrad (Duchi et al., 2011)....
[...]

Journal Article•

Random search for hyper-parameter optimization

[...]

James Bergstra¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Mar 2012-Journal of Machine Learning Research

TL;DR: This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.

...read moreread less

Abstract: Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

...read moreread less

6,935 citations

"Learning Program Embeddings to Prop..." refers methods in this paper

...We use random search (Bergstra & Bengio, 2012) to optimize over hyperparameters (e.g, regularization parameters, matrix dimensions, and minibatch size)....
[...]

Proceedings Article•

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

[...]

Richard Socher¹, Alex Perelygin, Jean Y. Wu¹, Jason Chuang², Christopher D. Manning¹, Andrew Y. Ng¹, Christopher Potts¹ - Show less +3 more•Institutions (2)

Stanford University¹, University of Washington²

01 Oct 2013

TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

...read moreread less

Abstract: Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.

...read moreread less

6,792 citations

"Learning Program Embeddings to Prop..." refers background or methods in this paper

...The programs for these assignments operate in maze worlds where an agent can move, turn, and test for conditions of its current location....
[...]
...Our models are related to recent work from the NLP and deep learning communities on recursive neural networks, particularly for modeling semantics in sentences or symbolic expressions (Socher et al., 2013; 2011; Zaremba et al., 2014; Bowman, 2013)....
[...]
...…on recursive neural networks (called the NPM-RNN model) in which we parametrize a matrix MA in this new model with an RNN whose architecture follows the abstract syntax tree (similar to the way in which RNN architectures might take the form of a parse tree in an NLP setting (Socher et al., 2013))....
[...]

Book•

A complexity measure

[...]

Thomas J. McCabe

04 Oct 1993

TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.

...read moreread less

Abstract: This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual FORTRAN programs are then presented to illustrate the correlation between intuitive complexity and the graph theoretic complexity. Several properties of the graph-theoretic complexity are then proved which show, for example, that complexity is independent of physical size (adding or subtracting functional statements leaves complexity unchanged) and complexity depends only on the decision structure of a program.The issue of using non-structured control flow is also discussed. A characterization of non-structured control graphs is given and a method of measuring the “structuredness” of a program is developed. The relationship between structure and reducibility is illustrated with several examples.The last section of the paper deals with a testing methodology used in conjunction with the complexity measure; a testing strategy is defined that dictates that a program can either admit of a certain minimal testing level or the program can be structurally reduced.

...read moreread less

5,171 citations