What future works have the authors mentioned in the paper "Feature location in source code: a taxonomy and survey" ?

The taxonomy facilitates the comparison of existing feature location techniques and illuminates possible areas of future research.

What was used to assess task difficulty?

The NASA Talk Load Index (TLX) was used to assess task difficulty, and distance profiles were used to gauge the degree to which the participants remained on-task.

What is the way to evaluate a feature location approach?

Another way to evaluate a feature location approach is to have system experts or even non-experts assess the results, which is an evaluation method often used by IR-based search engines.

What are the approaches to establish the mapping between the description of a feature and the source code?

The approaches to establish the mapping between the description of the feature and the source code include textual search with grep [Petrenko'08], Information Retrieval [Cleary'09, Gay'09, Marcus'04, Poshyvanyk'07b], and natural language processing [Hill'09, Shepherd'07].

Why is the granularity of the input program elements more fine grained than other?

Because the granularity of the input program elements is more fine grained (i.e., variables), the results are also more fine grained than other FLTs.

Why are there no papers that fall into the categories survey and experiment?

due to the fact that the feature location field is not as matured as other software engineering fields, there are no papers that fall into the categories survey and experiment.

What did the authors do to improve their taxonomy and attribute set?

Through thisCRC to Journal of Software Maintenance and Evolution: Research and Practiceprocess the authors were able to improve the quality of their taxonomy and attribute set as well as improve their descriptions.

What types of datasets could be used to evaluate a feature?

These datasets could contain a list of features, textual description or documentation about the features, mappings between features or bugs and program elements that are relevant to fixing the bug or implementing the feature (referred to as gold sets in the literature), patches submitted to an issue tracker, etc.

How many functions were able to partially comprehend?

The results show that among the 984 functions in Mosaic, the developer performing concept location on a maintenance task was able to partially comprehend the system by investigating only 22 (2%) of the functions.

(Open Access) Feature location in source code: a taxonomy and survey (2013) | Bogdan Dit

CRC to Journal of Software Maintenance and Evolution: Research and Practice

Feature Location in Source Code:

A Taxonomy and Survey

Bogdan Dit, Meghan Revelle, Malcom Gethers, Denys Poshyvanyk

The College of William and Mary

________________________________________________________________________

Feature location is the activity of identifying an initial location in the source code that

implements functionality in a software system. Many feature location techniques have

been introduced that automate some or all of this process, and a comprehensive overview

of this large body of work would be beneficial to researchers and practitioners. This

paper presents a systematic literature survey of feature location techniques. Eighty-nine

articles from 25 venues have been reviewed and classified within the taxonomy in order

to organize and structure existing work in the field of feature location. The paper also

discusses open issues and defines future directions in the field of feature location.

Keywords: Feature location, concept location, program comprehension, software

maintenance and evolution

________________________________________________________________________

1. INTRODUCTION

In software systems, a feature represents a functionality that is defined by requirements

and accessible to developers and users. Software maintenance and evolution involves

adding new features to programs, improving existing functionalities, and removing bugs,

which is analogous to removing unwanted functionalities. Identifying an initial location

in the source code that corresponds to a specific functionality is known as feature (or

concept) location [Biggerstaff'94, Rajlich'02]. It is one of the most frequent maintenance

activities undertaken by developers because it is a part of the incremental change process

[Rajlich'04]. During the incremental change process, programmers use feature location

to find where in the code the first change to complete a task needs to be made. The full

extent of the change is then handled by impact analysis, which starts with the source code

identified by feature location and finds all the code affected by the change.

Methodologically, the two activities of feature location and impact analysis are different

and are treated separately in the literature and in this survey.

Feature location is one of the most important and common activities performed by

programmers during software maintenance and evolution. No maintenance activity can

be completed without first locating the code that is relevant to the task at hand, making

feature location essential to software maintenance since it is performed in the context of

incremental change. For example, Alice is a new developer on a software project, and

her manager has given her the task of fixing a bug that has been recently reported. Since

Alice is new to this project, she is unfamiliar with the large code base of the software

system and does not know where to begin. Lacking sufficient documentation on the

system and the ability to ask the code’s original authors for help, the only option Alice

sees is to manually search for the code relevant to her task.

Alice’s situation is one faced by many software developers needing to understand and

modify an unfamiliar codebase. However, a manual search of a large amount of source

code, even with the help of tools such as pattern matchers or an integrated development

2 B. Dit M. Revelle M. Gethers and D. Poshyvanyk

CRC to Journal of Software Maintenance and Evolution: Research and Practice

environment, can be frustrating and time-consuming. Recognizing this problem,

software engineering researchers have developed a number of feature location techniques

(FLTs) to come to aid programmers in Alice’s position. The various techniques that have

been introduced are all unique in terms of their input requirements, how they locate a

feature’s implementation, and how they present their results. Thus, even the task of

choosing a suitable feature location technique can be challenging.

The existence of such a large body of feature location research calls for a

comprehensive overview. Since there currently is no broad summary of the field of

feature location, this paper provides a systematic survey and operational taxonomy of this

pertinent research area. To the best of our knowledge, Wilde et al. [Wilde'03] is the only

other survey, which in contrast to our survey, compares only a few feature location

techniques. Our survey includes research articles that introduce new feature location

approaches; case, industrial, and user studies; and tools that can be used in support of

feature location. The articles are characterized within a taxonomy that has nine

dimensions, and each dimension has a set of attributes associated with it. The dimensions

and attributes of the taxonomy capture key facets of typical feature location techniques

and can be useful to both software engineering researchers and practitioners

[Marcus'05b]. Researchers can use this survey to identify what has been done in the area

of feature location and what needs to be done; that is, they can use it to find related work

as well as opportunities for future research. Practitioners can use this overview to

determine which feature location approach is most suited to their needs.

This survey encompasses 89 articles (60 research articles and 29 tool and case study

papers) from 25 venues published between November 1992 and February 2011. These

research articles were selected because they either state feature/concept location as their

goal or present a technique that is essentially equivalent to feature location. The tool

papers include tools developed specifically for feature location as well as program

exploration tools that support feature location. The case study articles include industrial

and user studies as well as studies that compare existing approaches.

There are several research areas that are closely related to feature location, such as

traceability link recovery, impact analysis, and aspect mining. Traceability link recovery

seeks to connect different types of software artifacts (e.g., documentation with source

code), while feature location is more concerned with identifying source code associated

with functionalities, not specific sections of a document. Impact analysis is the step in

the incremental change process performed after feature location with the purpose of

expanding on feature location’s results, especially after a change is made to the source

code. Feature location focuses on finding the starting point for that change. The main

goal of aspect mining is to identify cross-cutting concerns and determine the source code

that should be refactored into aspects, meaning the aspects themselves are not known a

priori. By contrast, in the contexts in which feature location is used, the high-level

descriptions of features are already known and only the code that implements them is

unknown. Therefore, articles and research from these related fields are not included here

as they are beyond the scope of this focused survey.

The work presented in this paper has two main contributions. The first is a systematic

survey of feature location techniques, relevant case studies, and tools. The second is the

taxonomy derived from those techniques. An online appendix

lists all of the surveyed

articles classified within the taxonomy. Section 2 presents the systematic review process.

Section 3 introduces the dimensions of the taxonomy, and Section 4 provides brief

descriptions of the surveyed approaches. Section 5 overviews the feature location tools

http://www.cs.wm.edu/semeru/data/feature-location-survey/ (accessed and verified on 03/01/2011)

Feature Location in Source Code: A Taxonomy and Survey 3

CRC to Journal of Software Maintenance and Evolution: Research and Practice

and studies, and Section 6 provides an analysis of the taxonomy. Section 7 discusses open

issues in feature location and Section 8 concludes.

2. SYSTEMATIC REVIEW PROCESS

In this paper we perform a systematic survey of the feature location literature in order

to address the following research questions (RQ):

 RQ

: What types of analysis are used while performing feature location?

 RQ

: Has there been a change in types of analysis used to identify features in source

code employed by recent feature location techniques?

 RQ

: Are there any limitations to current strategies for evaluating various feature

location techniques?

In order to answer these research questions, we conducted a systematic review of the

literature using the following process (see Figure 1):

 Search: the initial set of articles to be considered during the selection process is

determined by identifying pertinent journals, conferences and workshops.

 Article Selection: using inclusion and exclusion criteria the initial set of articles

is filtered and only relevant articles are considered beyond this step.

 Article Characterization: articles, which meet the selection criteria, are

classified according to the set of attributes that capture important characteristics

of feature location techniques.

 Analysis: using the resulting taxonomy and systematic classification of the

papers, the research questions are answered and useful insights about the state of

feature location research and practice are outlined.

2.1. Search

An initial subset of papers of interest was obtained by manually evaluating articles that

appear in different venues considered during our preliminary exploration. We select

venues where feature location research is within their respective scope. Also, choosing

such venues ensures that selected articles meet some standard (e.g., the papers went

through a rigorous peer review process).

2.2. Article Selection

To adhere to the properties of systematic reviews [Kitchenham'04] we define the

following inclusion and exclusion criteria. In order to be included in the survey, a paper

must introduce, evaluate, and/or complement the implementation of a source code based

feature location technique. This includes papers that introduce novel feature location

techniques, evaluate various existing feature location techniques, or present tools

implementing existing or new approaches to feature location. The papers, which focused

on improving the performance of underlying analysis techniques (e.g., dynamic analysis,

Information Retrieval), as opposed to the feature location process were excluded.

2.3. Article Classification

The authors read and categorized each article according to the taxonomy and the

attributes presented in Section 3. The process of classifying the articles was followed by

four authors individually. Using initial classifications produced by the authors we

identified papers that had some disagreements and further discussed those papers. The

set of attributes was extracted and defined by two of the authors. Having all four authors

characterize the articles allows us to verifying the quality of the taxonomy, minimizing

potential bias. In certain cases disagreements served as an indication that our taxonomy

and attributes or their corresponding descriptions required refinement. Through this

4 B. Dit M. Revelle M. Gethers and D. Poshyvanyk

CRC to Journal of Software Maintenance and Evolution: Research and Practice

process we were able to improve the quality of our taxonomy and attribute set as well as

improve their descriptions.

2.4. Analysis

Following the process of classifying research papers our final step includes analysis the

results, answers to the research questions as well as an outline of future directions for

researchers and practitioners investigating feature location techniques. In order to

complete this step we analyzed the trends in our resulting taxonomy and observed

interesting co-occurrences of various attributes across feature location techniques. We

also investigated characteristics that rarely apply to the set of techniques considered as

well as characteristics which are currently emerging in the research literature.

3. DIMENSIONS OF THE SURVEY

The goal of this survey is to provide researchers and practitioners with a structured

overview of existing research in the area of feature location. From a methodical

inspection of the research literature we extracted a number of key dimensions

. These

dimensions objectively describe different techniques and offer structure to the surveyed

literature. The dimensions are as follows:

 The type of analysis: What underlying analyses are used to support feature

location?

 The type of user input: What does a developer have to provide as an input to the

feature location technique?

 Data sources: What derivative artifacts have to be provided as an input for the

feature location technique?

Some of these dimensions were discussed at the working session on Information Retrieval Approaches

in Software Evolution at 22

IEEE International Conference on Software Maintenance (ICSM’06):

http://www.cs.wayne.edu/~amarcus/icsm2006/

Figure 1 Systematic review process

Feature Location in Source Code: A Taxonomy and Survey 5

CRC to Journal of Software Maintenance and Evolution: Research and Practice

 Output: What type of the results and how are they provided back to the user?

 Programming language support: On which programming languages was this

technique instantiated?

 The evaluation of the approach: How was this feature location technique

evaluated?

 Systems evaluated: What are the systems that were used in the evaluation?

The order in which these dimensions are presented does not imply any explicit priority or

importance.

Each dimension has a number of distinct attributes associated with it. For a given

dimension, a feature location technique may be associated with multiple attributes. These

dimensions and their attributes were derived by examining an initial set of articles of

interest. They were then refined and generalized to succinctly characterize the properties

that make feature location techniques unique, and can be used to evaluate and compare

them. The goal of the taxonomy’s dimensions and attributes it to allow researchers and

practitioners to easily locate the feature location techniques that are most suited to their

needs. The dimensions and their associated attributes that are used in the taxonomy of

the surveyed articles are listed in Table 1. These dimensions and attributes are discussed

in the remainder of this section. The attributes are highlighted in italics.

3.1. Type of Analysis

A main distinguishing factor of feature location techniques is the type, or types of

analyses they employ to identify the code that pertains to a feature. The most common

types of analyses include dynamic, static, and textual. While these are not the only types

of analysis possible, they are the ones utilized by the vast majority of feature location

techniques, and some approaches even leverage more than one of these types of analysis.

In Section 4, descriptions of all the surveyed articles are given, and the section is

organized by the type(s) of analysis used.

Dynamic analysis refers to examining a software system’s execution, and it is often

used for feature location when features can be invoked and observed during runtime.

Feature location using dynamic analysis generally relies on a post-mortem analysis of an

execution trace. Typically, one or more feature-specific scenarios are developed that

invoke only the desired feature. Then, the scenarios are run and execution traces are

collected, recording information about the code that was invoked. These traces are

captured either by instrumenting the system or through profiling. Once the traces are

obtained, feature location can be performed in several ways. The traces can be compared

to other traces in which the feature was not invoked to find code only invoked in the

feature-specific traces [Eisenbarth'03, Wilde'95]. Alternatively, the frequency of

execution portions of code can be analyzed to locate a feature’s implementation

[Antoniol'06, Eisenberg'05, Safyallah'06]. Using dynamic analysis for feature location is

a popular choice since most features can be mapped to execution scenarios. However,

there are some limitations associated with dynamic analysis. The collection of traces can

impose considerable overhead on a system’s execution. Additionally, the scenarios used

to collect traces may not invoke all of the code that is relevant to the feature, meaning

that some of the feature’s implementation may not be located. Conversely, it may be

difficult to formulate a scenario that invokes only the desired feature, causing irrelevant

code to be executed. Dynamic feature location techniques are discussed in Section 4.2.

Static analysis examines structural information such as control or data flow

dependencies. In manual feature location, developers may follow program dependencies

in a section of code they deem to be relevant in order to find additional useful code, and

Feature location in source code: a taxonomy and survey

Figures

Citations

Software development in startup companies: A systematic mapping study

Improving bug localization using structured information retrieval

How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms

A survey of code‐based change impact analysis techniques

Automatic query reformulations for text retrieval in software engineering

References

Latent dirichlet allocation

Latent Dirichlet Allocation

The anatomy of a large-scale hypertextual Web search engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Indexing by Latent Semantic Analysis

Related Papers (5)

Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval

An information retrieval approach to concept location in source code

Latent dirichlet allocation

Recovering traceability links between code and documentation

Locating features in source code

Frequently Asked Questions (11)

Q1. What are the contributions in "Feature location in source code: a taxonomy and survey" ?

Q2. What future works have the authors mentioned in the paper "Feature location in source code: a taxonomy and survey" ?

Q3. What was used to assess task difficulty?

Q4. What is the way to evaluate a feature location approach?

Q5. What are the approaches to establish the mapping between the description of a feature and the source code?

Q6. What is the importance of feature location in software maintenance?

Q7. Why is the granularity of the input program elements more fine grained than other?

Q8. Why are there no papers that fall into the categories survey and experiment?

Q9. What did the authors do to improve their taxonomy and attribute set?

Q10. What types of datasets could be used to evaluate a feature?

Q11. How many functions were able to partially comprehend?