scispace - formally typeset
Open AccessProceedings ArticleDOI

Automatic query reformulations for text retrieval in software engineering

TLDR
A recommender (called Refoqus) based on machine learning is proposed, which is trained with a sample of queries and relevant results and automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query.
Abstract
There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place. We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Deep code search

TL;DR: A novel deep neural network named CODEnn (Code-Description Embedding Neural Network) is proposed, which jointly embeds code snippets and natural language descriptions into a high-dimensional vector space, in such a way that code snippet and its corresponding description have similar vectors.
Journal ArticleDOI

The use of machine learning algorithms in recommender systems: A systematic review

TL;DR: The study concludes that Bayesian and decision tree algorithms are widely used in recommender systems because of their relative simplicity, and that requirement and design phases of recommender system development appear to offer opportunities for further research.
Posted Content

The Use of Machine Learning Algorithms in Recommender Systems: A Systematic Review

TL;DR: In this paper, the authors present a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for software engineering research, and conclude that Bayesian and decision tree algorithms are widely used in recommendation systems because of their relative simplicity and that requirement and design phases of recommender system development appear to offer opportunities for further research.
Proceedings ArticleDOI

From word embeddings to document similarities for improved information retrieval in software engineering

TL;DR: This paper proposes bridging the lexical gap by projecting natural language statements and code snippets as meaning vectors in a shared representation space and shows that the learned vector space embeddings lead to improvements in a previously explored bug localization task and a newly introduced task of linking API documents to computer programming questions.
Proceedings ArticleDOI

CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E)

TL;DR: This paper proposes CodeHow, a code search technique that can recognize potential APIs a user query refers to and performs code retrieval by applying the Extended Boolean model, which considers the impact of both text similarity and potential APIs on code search.
References
More filters
Journal ArticleDOI

A Simple Sequentially Rejective Multiple Test Procedure

TL;DR: In this paper, a simple and widely accepted multiple test procedure of the sequentially rejective type is presented, i.e. hypotheses are rejected one at a time until no further rejections can be done.
Journal ArticleDOI

Classification and regression trees

TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Book

Classification and regression trees

Leo Breiman
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Related Papers (5)