Automatic query reformulations for text retrieval in software engineering
Sonia Haiduc,Gabriele Bavota,Andrian Marcus,Rocco Oliveto,Andrea De Lucia,Tim Menzies +5 more
- pp 842-851
TLDR
A recommender (called Refoqus) based on machine learning is proposed, which is trained with a sample of queries and relevant results and automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query.Abstract:
There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place. We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).read more
Citations
More filters
Proceedings ArticleDOI
Deep code search
TL;DR: A novel deep neural network named CODEnn (Code-Description Embedding Neural Network) is proposed, which jointly embeds code snippets and natural language descriptions into a high-dimensional vector space, in such a way that code snippet and its corresponding description have similar vectors.
Journal ArticleDOI
The use of machine learning algorithms in recommender systems: A systematic review
TL;DR: The study concludes that Bayesian and decision tree algorithms are widely used in recommender systems because of their relative simplicity, and that requirement and design phases of recommender system development appear to offer opportunities for further research.
Posted Content
The Use of Machine Learning Algorithms in Recommender Systems: A Systematic Review
TL;DR: In this paper, the authors present a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for software engineering research, and conclude that Bayesian and decision tree algorithms are widely used in recommendation systems because of their relative simplicity and that requirement and design phases of recommender system development appear to offer opportunities for further research.
Proceedings ArticleDOI
From word embeddings to document similarities for improved information retrieval in software engineering
TL;DR: This paper proposes bridging the lexical gap by projecting natural language statements and code snippets as meaning vectors in a shared representation space and shows that the learned vector space embeddings lead to improvements in a previously explored bug localization task and a newly introduced task of linking API documents to computer programming questions.
Proceedings ArticleDOI
CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E)
TL;DR: This paper proposes CodeHow, a code search technique that can recognize potential APIs a user query refers to and performs code retrieval by applying the Extended Boolean model, which considers the impact of both text similarity and potential APIs on code search.
References
More filters
Journal ArticleDOI
Classification and Regression Trees.
Journal ArticleDOI
A Simple Sequentially Rejective Multiple Test Procedure
TL;DR: In this paper, a simple and widely accepted multiple test procedure of the sequentially rejective type is presented, i.e. hypotheses are rejected one at a time until no further rejections can be done.
Journal ArticleDOI
Classification and regression trees
TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Book
Classification and regression trees
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.