Automatic query reformulations for text retrieval in software engineering

doi:10.5555/2486788.2486898

Open AccessProceedings ArticleDOI

Automatic query reformulations for text retrieval in software engineering

- pp 842-851

TLDR

A recommender (called Refoqus) based on machine learning is proposed, which is trained with a sample of queries and relevant results and automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query.

Abstract:

There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place. We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

Automatic query reformulations for text retrieval in software engineering

Citations

Deep code search

The use of machine learning algorithms in recommender systems: A systematic review

The Use of Machine Learning Algorithms in Recommender Systems: A Systematic Review

From word embeddings to document similarities for improved information retrieval in software engineering

CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E)

References

Classification and Regression Trees.

A Simple Sequentially Rejective Multiple Test Procedure

Classification and regression trees

Classification and regression trees

The SMART Retrieval System—Experiments in Automatic Document Processing

Related Papers (5)

Introduction to Information Retrieval

An information retrieval approach to concept location in source code

A Survey of Automatic Query Expansion in Information Retrieval

Portfolio: finding relevant functions and their usage

Feature location in source code: a taxonomy and survey