scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Software fault prediction metrics

TL;DR: Object-oriented and process metrics have been reported to be more successful in finding faults compared to traditional size and complexity metrics and seem to be better at predicting post-release faults than any static code metrics.
Abstract: ContextSoftware metrics may be used in fault prediction models to improve software quality by predicting fault location. ObjectiveThis paper aims to identify software metrics and to assess their applicability in software fault prediction. We investigated the influence of context on metrics' selection and performance. MethodThis systematic literature review includes 106 papers published between 1991 and 2011. The selected papers are classified according to metrics and context properties. ResultsObject-oriented metrics (49%) were used nearly twice as often compared to traditional source code metrics (27%) or process metrics (24%). Chidamber and Kemerer's (CK) object-oriented metrics were most frequently used. According to the selected studies there are significant differences between the metrics used in fault prediction performance. Object-oriented and process metrics have been reported to be more successful in finding faults compared to traditional size and complexity metrics. Process metrics seem to be better at predicting post-release faults compared to any static code metrics. ConclusionMore studies should be performed on large industrial software systems to find metrics more relevant for the industry and to answer the question as to which metrics should be used in a given context.
Citations
More filters
Journal ArticleDOI
01 Feb 2015
TL;DR: The machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers, however, the application of theMachine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results.
Abstract: Reviews studies from 1991-2013 to assess application of ML techniques for SFP.Identifies seven categories of the ML techniques.Identifies 64 studies to answer the established research questions.Selects primary studies according to the quality assessment of the studies.Systematic literature review performs the following:Summarize ML techniques for SFP models.Assess performance accuracy and capability of ML techniques for constructing SFP models.Provide comparison between the ML and statistical techniques.Provide comparison of performance accuracy of different ML techniques.Summarize the strength and weakness of the ML techniques.Provides future guidelines to software practitioners and researchers. BackgroundSoftware fault prediction is the process of developing models that can be used by the software practitioners in the early phases of software development life cycle for detecting faulty constructs such as modules or classes. There are various machine learning techniques used in the past for predicting faults. MethodIn this study we perform a systematic review of studies from January 1991 to October 2013 in the literature that use the machine learning techniques for software fault prediction. We assess the performance capability of the machine learning techniques in existing research for software fault prediction. We also compare the performance of the machine learning techniques with the statistical techniques and other machine learning techniques. Further the strengths and weaknesses of machine learning techniques are summarized. ResultsIn this paper we have identified 64 primary studies and seven categories of the machine learning techniques. The results prove the prediction capability of the machine learning techniques for classifying module/class as fault prone or not fault prone. The models using the machine learning techniques for estimating software fault proneness outperform the traditional statistical models. ConclusionBased on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers. However, the application of the machine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work.

483 citations


Cites background from "Software fault prediction metrics"

  • ...The existing literature reviews aim to answer various research questions [3-5], but to the best of the authors knowledge there is no systematic review that focus on the ML techniques for the software fault prediction i....

    [...]

Book ChapterDOI
Jana Polgar1
01 Jan 2005

394 citations

Journal ArticleDOI
Peng He1, Bing Li1, Xiao Liu1, Jun Chen1, Yutao Ma1 
TL;DR: The experimental results indicate that the choice of training data for defect prediction should depend on the specific requirement of accuracy and the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice.
Abstract: ContextSoftware defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between within- and cross-project defect prediction when available historical data are insufficient remain unclear. ObjectiveThe objective of this work is to validate the feasibility of the predictor built with a simplified metric set for software defect prediction in different scenarios, and to investigate practical guidelines for the choice of training data, classifier and metric subset of a given project. MethodFirst, based on six typical classifiers, three types of predictors using the size of software metric set were constructed in three scenarios. Then, we validated the acceptable performance of the predictor based on Top-k metrics in terms of statistical methods. Finally, we attempted to minimize the Top-k metric subset by removing redundant metrics, and we tested the stability of such a minimum metric subset with one-way ANOVA tests. ResultsThe study has been conducted on 34 releases of 10 open-source projects available at the PROMISE repository. The findings indicate that the predictors built with either Top-k metrics or the minimum metric subset can provide an acceptable result compared with benchmark predictors. The guideline for choosing a suitable simplified metric set in different scenarios is presented in Table 12. ConclusionThe experimental results indicate that (1) the choice of training data for defect prediction should depend on the specific requirement of accuracy; (2) the predictor built with a simplified metric set works well and is very useful in case limited resources are supplied; (3) simple classifiers (e.g., Naive Bayes) also tend to perform well when using a simplified metric set for defect prediction; and (4) in several cases, the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice.

255 citations


Cites background or methods from "Software fault prediction metrics"

  • ...Keywords: defect prediction, software metrics, metric set simplification, software quality...

    [...]

  • ...The results revealed that their approach could build 64% more useful predictors than both within-company and cross-company approaches based on Burak filters, and demonstrated that cross-company defect prediction was able to be applied very early in a project’s lifecycle....

    [...]

Journal ArticleDOI
TL;DR: CPDP is still a challenge and requires more research before trustworthy applications can take place and this work synthesises literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances.
Abstract: Background: Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. Objective: To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP versus within project DP models. Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naive Bayes yields average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP versus WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research.

246 citations

Journal ArticleDOI
TL;DR: Imbalanced learning should only be considered for moderate or highly imbalanced SDP data sets and the appropriate combination of imbalanced method and classifier needs to be carefully chosen to ameliorate the imbalanced learning problem for SDP.
Abstract: Context: Software defect prediction (SDP) is an important challenge in the field of software engineering, hence much research work has been conducted, most notably through the use of machine learning algorithms. However, class-imbalance typified by few defective components and many non-defective ones is a common occurrence causing difficulties for these methods. Imbalanced learning aims to deal with this problem and has recently been deployed by some researchers, unfortunately with inconsistent results. Objective: We conduct a comprehensive experiment to explore (a) the basic characteristics of this problem; (b) the effect of imbalanced learning and its interactions with (i) data imbalance, (ii) type of classifier, (iii) input metrics and (iv) imbalanced learning method. Method: We systematically evaluate 27 data sets, 7 classifiers, 7 types of input metrics and 17 imbalanced learning methods (including doing nothing) using an experimental design that enables exploration of interactions between these factors and individual imbalanced learning algorithms. This yields 27 × 7 × 7 × 17 = 22491 results. The Matthews correlation coefficient (MCC) is used as an unbiased performance measure (unlike the more widely used F1 and AUC measures). Results: (a) we found a large majority (87 percent) of 106 public domain data sets exhibit moderate or low level of imbalance (imbalance ratio $ 10; median = 3.94); (b) anything other than low levels of imbalance clearly harm the performance of traditional learning for SDP; (c) imbalanced learning is more effective on the data sets with moderate or higher imbalance, however negative results are always possible; (d) type of classifier has most impact on the improvement in classification performance followed by the imbalanced learning method itself. Type of input metrics is not influential. (e) only ${\sim} 52\%$ ∼ 52 % of the combinations of Imbalanced Learner and Classifier have a significant positive effect. Conclusion: This paper offers two practical guidelines. First, imbalanced learning should only be considered for moderate or highly imbalanced SDP data sets. Second, the appropriate combination of imbalanced method and classifier needs to be carefully chosen to ameliorate the imbalanced learning problem for SDP. In contrast, the indiscriminate application of imbalanced learning can be harmful.

188 citations


Cites background from "Software fault prediction metrics"

  • ...These choices are made because static code metrics are most frequently used in software defect prediction [68], network metrics may have a stronger association with defects [60] and process metrics reflect the changes to software systems over time....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
Abstract: This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.

64,109 citations


"Software fault prediction metrics" refers background or methods in this paper

  • ...[73] B. Caglayan, A. Bener, S. Koch, Merits of using repository metrics in defect prediction for open source projects, in: 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development, 2009, pp. 31–36....

    [...]

  • ...80 or higher, which is substantial, according to Landis and Koch [36]....

    [...]

  • ...Eight out of the ten properties had a Cohen’s Kappa coefficient of 0.80 or higher, which is substantial, according to Landis and Koch [36]....

    [...]

  • ...An inter-rater agreement analysis was then used to determine the degree of agreement between the two researchers (Table 2), which was substantial, according to [36]....

    [...]

  • ...According to the different levels, as stipulated by Landis and Koch [36], the agreement between the two researchers for the exclusion of primary studies based on title and abstract was ‘substantial’ (0....

    [...]

Journal ArticleDOI
Jacob Cohen1
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Abstract: CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of measurement obtainable is nominal scaling (Stevens, 1951, pp. 2526), i.e. placement in a set of k unordered categories. Because the categorizing of the units is a consequence of some complex judgment process performed by a &dquo;two-legged meter&dquo; (Stevens, 1958), it becomes important to determine the extent to which these judgments are reproducible, i.e., reliable. The procedure which suggests itself is that of having two (or more) judges independently categorize a sample of units and determine the degree, significance, and

34,965 citations


Additional excerpts

  • ...Cohen’s Kappa coefficient [14] was calculated....

    [...]

Book
02 Sep 2011
TL;DR: This research addresses the needs for software measures in object-orientation design through the development and implementation of a new suite of metrics for OO design, and suggests ways in which managers may use these metrics for process improvement.
Abstract: Given the central role that software development plays in the delivery and application of information technology, managers are increasingly focusing on process improvement in the software development area. This demand has spurred the provision of a number of new and/or improved approaches to software development, with perhaps the most prominent being object-orientation (OO). In addition, the focus on process improvement has increased the demand for software measures, or metrics with which to manage the process. The need for such metrics is particularly acute when an organization is adopting a new technology for which established practices have yet to be developed. This research addresses these needs through the development and implementation of a new suite of metrics for OO design. Metrics developed in previous research, while contributing to the field's understanding of software development processes, have generally been subject to serious criticisms, including the lack of a theoretical base. Following Wand and Weber (1989), the theoretical base chosen for the metrics was the ontology of Bunge (1977). Six design metrics are developed, and then analytically evaluated against Weyuker's (1988) proposed set of measurement principles. An automated data collection tool was then developed and implemented to collect an empirical sample of these metrics at two field sites in order to demonstrate their feasibility and suggest ways in which managers may use these metrics for process improvement. >

5,476 citations


"Software fault prediction metrics" refers background or methods in this paper

  • ...But, method complexity was deliberately not defined in the original proposal in order to enable the most general application of the metric [13]....

    [...]

  • ...) The WMC metric (‘Weighted Methods per Class’) is defined as the sum of methods complexity [12,13]....

    [...]

  • ...In 1991, Chidamber and Kemerer [12] presented the CK metrics suite and refined it in 1994 [13]....

    [...]

  • ...Chidamber and Kemerer [13])....

    [...]

  • ...[70], Cartwright and Shepperd [74], Chidamber and Kemerer (CK metrics suite) [12,13], Etzkorn et al....

    [...]

Book
04 Oct 1993
TL;DR: In this paper, a graph-theoretic complexity measure for managing and controlling program complexity is presented. But the complexity is independent of physical size, and complexity depends only on the decision structure of a program.
Abstract: This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual FORTRAN programs are then presented to illustrate the correlation between intuitive complexity and the graph theoretic complexity. Several properties of the graph-theoretic complexity are then proved which show, for example, that complexity is independent of physical size (adding or subtracting functional statements leaves complexity unchanged) and complexity depends only on the decision structure of a program.The issue of using non-structured control flow is also discussed. A characterization of non-structured control graphs is given and a method of measuring the “structuredness” of a program is developed. The relationship between structure and reducibility is illustrated with several examples.The last section of the paper deals with a testing methodology used in conjunction with the complexity measure; a testing strategy is defined that dictates that a program can either admit of a certain minimal testing level or the program can be structurally reduced.

5,171 citations

Journal ArticleDOI
TL;DR: Several properties of the graph-theoretic complexity are proved which show, for example, that complexity is independent of physical size and complexity depends only on the decision structure of a program.
Abstract: This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph-theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual Fortran programs are then presented to illustrate the correlation between intuitive complexity and the graph-theoretic complexity. Several properties of the graph-theoretic complexity are then proved which show, for example, that complexity is independent of physical size (adding or subtracting functional statements leaves complexity unchanged) and complexity depends only on the decision structure of a program.

5,097 citations


"Software fault prediction metrics" refers background or methods in this paper

  • ...[37], Li [41], Li and Henry [39,40], Lorenz and Kidd [42], McCabe [44], Tegarden et al....

    [...]

  • ...McCabe’s cyclomatic complexity [44] was the most frequently used of the traditional metrics....

    [...]

  • ...One of the oldest and well-known metrics presented by McCabe [44] dates back to 1976....

    [...]

  • ...McCabe [44] and Halstead [24])....

    [...]

  • ...Popular complexity metrics, like McCabe’s cyclomatic complexity [44] and Halstead’s metrics [24], were used in 22 and 12 studies, respectively....

    [...]