Evaluation Methodologies for Code Learning Tasks

Posted Content•

Evaluation Methodologies for Code Learning Tasks

Pengyu Nie, Jiyang Zhang, Junyi Jessy Li, Raymond J. Mooney, Milos Gligoric - Show less +1 more

22 Aug 2021-arXiv: Software Engineering-

TL;DR: In this paper, a time-segmented evaluation methodology is proposed to evaluate code-comment pairs with timestamps to train and evaluate several recent code learning ML models for the comment generation and method naming tasks.

read less

Abstract: There has been a growing interest in developing machine learning (ML) models for code learning tasks, e.g., comment generation and method naming. Despite substantial increase in the effectiveness of ML models, the evaluation methodologies, i.e., the way people split datasets into training, validation, and testing sets, were not well designed. Specifically, no prior work on the aforementioned topics considered the timestamps of code and comments during evaluation (e.g., examples in the testing set might be from 2010 and examples from the training set might be from 2020). This may lead to evaluations that are inconsistent with the intended use cases of the ML models. In this paper, we formalize a novel time-segmented evaluation methodology, as well as the two methodologies commonly used in the literature: mixed-project and cross-project. We argue that time-segmented methodology is the most realistic. We also describe various use cases of ML models and provide a guideline for using methodologies to evaluate each use case. To assess the impact of methodologies, we collect a dataset of code-comment pairs with timestamps to train and evaluate several recent code learning ML models for the comment generation and method naming tasks. Our results show that different methodologies can lead to conflicting and inconsistent results. We invite the community to adopt the time-segmented evaluation methodology.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Learning to Describe Solutions for Bug Reports Based on Developer Discussions

[...]

01 Jan 2022

TL;DR: In this article , the authors propose to generate a concise natural language description of the solution by synthesizing relevant content within the discussion, which encompasses both natural language and source code, and design two systems for generating a description during an ongoing discussion by classifying when sufficient context for performing the task emerges.

...read moreread less

Abstract: When a software bug is reported, developers engage in a discussion to collaboratively resolve it. While the solution is likely formulated within the discussion, it is often buried in a large amount of text, making it difficult to comprehend and delaying its implementation. To expedite bug resolution, we propose generating a concise natural language description of the solution by synthesizing relevant content within the discussion, which encompasses both natural language and source code. We build a corpus for this task using a novel technique for obtaining noisy supervision from repository changes linked to bug reports, with which we establish benchmarks. We also design two systems for generating a description during an ongoing discussion by classifying when sufficient context for performing the task emerges in real-time. With automated and human evaluation, we find this task to form an ideal testbed for complex reasoning in long, bimodal dialogue context.

...read moreread less

Posted Content•

Learning to Describe Solutions for Bug Reports Based on Developer Discussions

[...]

Sheena Panthaplackel¹, Junyi Jessy Li¹, Milos Gligoric¹, Raymond J. Mooney¹•Institutions (1)

University of Texas at Austin¹

08 Oct 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors propose to generate a concise natural language description of the solution by synthesizing relevant content within the discussion, which encompasses both natural language and source code, and determine when sufficient context about the solution emerges in real-time.

...read moreread less

Abstract: When a software bug is reported, developers engage in a discussion to collaboratively resolve it. While the solution is likely formulated within the discussion, it is often buried in a large amount of text, making it difficult to comprehend, which delays its implementation. To expedite bug resolution, we propose generating a concise natural language description of the solution by synthesizing relevant content within the discussion, which encompasses both natural language and source code. Furthermore, to support generating an informative description during an ongoing discussion, we propose a secondary task of determining when sufficient context about the solution emerges in real-time. We construct a dataset for these tasks with a novel technique for obtaining noisy supervision from repository changes linked to bug reports. We establish baselines for generating solution descriptions, and develop a classifier which makes a prediction following each new utterance on whether or not the necessary context for performing generation is available. Through automated and human evaluation, we find these tasks to form an ideal testbed for complex reasoning in long, bimodal dialogue context.

...read moreread less

Evaluation Methodologies for Code Learning Tasks

Citations

References

Related Papers (5)