scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Software Engineering in 2007"


Journal ArticleDOI
TL;DR: It is shown that static code attributes used to build defect predictors are much more important than which particular attributes are used, and contrary to prior pessimism, they are demonstrably useful and yield predictors with a mean probability of detection and mean false alarms rates.
Abstract: The value of using static code attributes to learn defect predictors has been widely debated. Prior work has explored issues like the merits of "McCabes versus Halstead versus lines of code counts" for generating defect predictors. We show here that such debates are irrelevant since how the attributes are used to build predictors is much more important than which particular attributes are used. Also, contrary to prior pessimism, we show that such defect predictors are demonstrably useful and, on the data studied here, yield predictors with a mean probability of detection of 71 percent and mean false alarms rates of 25 percent. These predictors would be useful for prioritizing a resource-bound exploration of code that has yet to be inspected

1,135 citations


Journal ArticleDOI
TL;DR: A new modeling approach to the Web service selection problem that is particularly effective for large processes and when QoS constraints are severe is introduced.
Abstract: In advanced service oriented systems, complex applications, described as abstract business processes, can be executed by invoking a number of available Web services. End users can specify different preferences and constraints and service selection can be performed dynamically identifying the best set of services available at runtime. In this paper, we introduce a new modeling approach to the Web service selection problem that is particularly effective for large processes and when QoS constraints are severe. In the model, the Web service selection problem is formalized as a mixed integer linear programming problem, loops peeling is adopted in the optimization, and constraints posed by stateful Web services are considered. Moreover, negotiation techniques are exploited to identify a feasible solution of the problem, if one does not exist. Experimental results compare our method with other solutions proposed in the literature and demonstrate the effectiveness of our approach toward the identification of an optimal solution to the QoS constrained Web service selection problem

896 citations


Journal ArticleDOI
TL;DR: A systematic review of previous work identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set to provide a basis for the improvement of software-estimation research.
Abstract: This paper aims to provide a basis for the improvement of software-estimation research through a systematic review of previous work. The review identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set. A Web-based library of these cost estimation papers is provided to ease the identification of relevant estimation research results. The review results combined with other knowledge provide support for recommendations for future software cost estimation research, including: 1) increase the breadth of the search for relevant studies, 2) search manually for relevant papers within a carefully selected set of journals when completeness is essential, 3) conduct more studies on estimation methods commonly used by the software industry, and 4) increase the awareness of how properties of the data sets impact the results when evaluating estimation methods

835 citations


Journal ArticleDOI
TL;DR: An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection.
Abstract: Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as an independent third party. The selected techniques cover the whole spectrum of the state-of-the-art in clone detection. The techniques work on text, lexical and syntactic information, software metrics, and program dependency graphs.

765 citations



Journal ArticleDOI
TL;DR: The paper addresses the problems of choice of fitness metric, characterization of landscape modality, and determination of the most suitable search technique to apply, and sheds light on the nature of the regression testing search space, indicating that it is multimodal.
Abstract: Regression testing is an expensive, but important, process. Unfortunately, there may be insufficient resources to allow for the reexecution of all test cases during regression testing. In this situation, test case prioritization techniques aim to improve the effectiveness of regression testing by ordering the test cases so that the most beneficial are executed first. Previous work on regression test case prioritization has focused on greedy algorithms. However, it is known that these algorithms may produce suboptimal results because they may construct results that denote only local minima within the search space. By contrast, metaheuristic and evolutionary search algorithms aim to avoid such problems. This paper presents results from an empirical study of the application of several greedy, metaheuristic, and evolutionary search algorithms to six programs, ranging from 374 to 11,148 lines of code for three choices of fitness metric. The paper addresses the problems of choice of fitness metric, characterization of landscape modality, and determination of the most suitable search technique to apply. The empirical results replicate previous results concerning greedy algorithms. They shed light on the nature of the regression testing search space, indicating that it is multimodal. The results also show that genetic algorithms perform well, although greedy approaches are surprisingly effective, given the multimodal nature of the landscape

690 citations


Journal ArticleDOI
TL;DR: The change distilling algorithm is presented, a tree differencing algorithm for fine-grained source code change extraction that approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al.
Abstract: A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for fine-grained source code change extraction. For that, we have improved the existing algorithm by Chawathe et al. for extracting changes in hierarchically structured data. Our algorithm extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that can transform one tree into the other given the computed matching. As a result, we can identify fine-grained change types between program versions according to our taxonomy of source code changes. We evaluated our change distilling algorithm with a benchmark that we developed, which consists of 1,064 manually classified changes in 219 revisions of eight methods from three different open source projects. We achieved significant improvements in extracting types of source code changes: Our algorithm approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al. We are able to find all occurring changes and almost reach the minimum conforming edit script, that is, we reach a mean absolute percentage error of 34 percent, compared to the 79 percent reached by the original algorithm. The paper describes both our change distilling algorithm and the results of our evolution.

566 citations



Journal ArticleDOI
TL;DR: The results show that the combination of experts significantly improves the effectiveness of feature location as compared to each of the experts used independently.
Abstract: This paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The solution to the problem is formulated as a combination of the opinions of different experts. The experts in this work are two existing techniques for feature location: a scenario-based probabilistic ranking of events and an information-retrieval-based technique that uses latent semantic indexing. The combination of these two experts is empirically evaluated through several case studies, which use the source code of the Mozilla Web browser and the Eclipse integrated development environment. The results show that the combination of experts significantly improves the effectiveness of feature location as compared to each of the experts used independently

473 citations


Journal ArticleDOI
TL;DR: It is clear that some organizations would be ill-served by cross-company models whereas others would benefit, and experimenters need to standardize their experimental procedures to enable formal meta-analysis of the results.
Abstract: The objective of this paper is to determine under what circumstances individual organizations would be able to rely on cross-company-based estimation models. We performed a systematic review of studies that compared predictions from cross-company models with predictions from within-company models based on analysis of project data. Ten papers compared cross-company and within-company estimation models; however, only seven presented independent results. Of those seven, three found that cross-company models were not significantly different from within-company models, and four found that cross-company models were significantly worse than within-company models. Experimental procedures used by the studies differed making it impossible to undertake formal meta-analysis of the results. The main trend distinguishing study results was that studies with small within-company data sets (i.e., $20 projects) that used leave-one-out cross validation all found that the within-company model was significantly different (better) from the cross-company model. The results of this review are inconclusive. It is clear that some organizations would be ill-served by cross-company models whereas others would benefit. Further studies are needed, but they must be independent (i.e., based on different data bases or at least different single company data sets) and should address specific hypotheses concerning the conditions that would favor cross-company or within-company models. In addition, experimenters need to standardize their experimental procedures to enable formal meta-analysis, and recommendations are made in Section 3.

408 citations


Journal ArticleDOI
TL;DR: This paper provides a detailed analysis of the behavior of various similarity and distance measures that may be employed for software clustering, and analyzes the clustering process of various well-known clustering algorithms by using multiple criteria.
Abstract: Gaining an architectural level understanding of a software system is important for many reasons. When the description of a system's architecture does not exist, attempts must be made to recover it. In recent years, researchers have explored the use of clustering for recovering a software system's architecture, given only its source code. The main contributions of this paper are given as follows. First, we review hierarchical clustering research in the context of software architecture recovery and modularization. Second, to employ clustering meaningfully, it is necessary to understand the peculiarities of the software domain, as well as the behavior of clustering measures and algorithms in this domain. To this end, we provide a detailed analysis of the behavior of various similarity and distance measures that may be employed for software clustering. Third, we analyze the clustering process of various well-known clustering algorithms by using multiple criteria, and we show how arbitrary decisions taken by these algorithms during clustering affect the quality of their results. Finally, we present an analysis of two recently proposed clustering algorithms, revealing close similarities in their apparently different clustering approaches. Experiments on four legacy software systems provide insight into the behavior of well-known clustering algorithms and their characteristics in the software domain.

Journal ArticleDOI
TL;DR: This paper surveys and analyzes current component models and classify them into a taxonomy based on commonly accepted desiderata for CBD, and describes its key characteristics and evaluates them with respect to these Desiderata.
Abstract: Component-based development (CBD) is an important emerging topic in software engineering, promising long-sought-after benefits like increased reuse, reduced time to market, and, hence, reduced software production cost. The cornerstone of a CBD technology is its underlying software component model, which defines components and their composition mechanisms. Current models use objects or architectural units as components. These are not ideal for component reuse or systematic composition. In this paper, we survey and analyze current component models and classify them into a taxonomy based on commonly accepted desiderata for CBD. For each category in the taxonomy, we describe its key characteristics and evaluate them with respect to these desiderata.

Journal ArticleDOI
TL;DR: Analyzing multivariate binary logistic regression models across six Rhino versions indicates these models may be useful in assessing quality in OO classes produced using modern highly iterative or agile software development processes.
Abstract: Empirical validation of software metrics suites to predict fault proneness in object-oriented (OO) components is essential to ensure their practical use in industrial settings. In this paper, we empirically validate three OO metrics suites for their ability to predict software quality in terms of fault-proneness: the Chidamber and Kemerer (CK) metrics, Abreu's Metrics for Object-Oriented Design (MOOD), and Bansiya and Davis' Quality Metrics for Object-Oriented Design (QMOOD). Some CK class metrics have previously been shown to be good predictors of initial OO software quality. However, the other two suites have not been heavily validated except by their original proposers. Here, we explore the ability of these three metrics suites to predict fault-prone classes using defect data for six versions of Rhino, an open-source implementation of JavaScript written in Java. We conclude that the CK and QMOOD suites contain similar components and produce statistical models that are effective in detecting error-prone classes. We also conclude that the class components in the MOOD metrics suite are not good class fault-proneness predictors. Analyzing multivariate binary logistic regression models across six Rhino versions indicates these models may be useful in assessing quality in OO classes produced using modern highly iterative or agile software development processes.

Journal ArticleDOI
TL;DR: The results of this experiment do not support the hypotheses that pair programming in general reduces the time required to solve the tasks correctly or increases the proportion of correct solutions.
Abstract: A total of 295 junior, intermediate, and senior professional Java consultants (99 individuals and 98 pairs) from 29 international consultancy companies in Norway, Sweden, and the UK were hired for one day to participate in a controlled experiment on pair programming. The subjects used professional Java tools to perform several change tasks on two alternative Java systems with different degrees of complexity. The results of this experiment do not support the hypotheses that pair programming in general reduces the time required to solve the tasks correctly or increases the proportion of correct solutions. On the other hand, there is a significant 84 percent increase in effort to perform the tasks correctly. However, on the more complex system, the pair programmers had a 48 percent increase in the proportion of correct solutions but no significant differences in the time taken to solve the tasks correctly. For the simpler system, there was a 20 percent decrease in time taken but no significant differences in correctness. However, the moderating effect of system complexity depends on the programmer expertise of the subjects. The observed benefits of pair programming in terms of correctness on the complex system apply mainly to juniors, whereas the reductions in duration to perform the tasks correctly on the simple system apply mainly to intermediates and seniors. It is possible that the benefits of pair programming will exceed the results obtained in this experiment for larger, more complex tasks and if the pair programmers have a chance to work together over a longer period of time

Journal ArticleDOI
TL;DR: It is shown that static code attributes used to build defect predictors are much more important than which particular attributes are used, and contrary to prior pessimism, they are demonstrably useful and yield predictors with a mean probability of detection and mean false alarms rates.
Abstract: Zhang and Zhang argue that predictors are useless unless they have high precison&recall. We have a different view, for two reasons. First, for SE data sets with large neg/pos ratios, it is often required to lower precision to achieve higher recall. Second, there are many domains where low precision detectors are useful.

Journal ArticleDOI
TL;DR: It is found that theory use and awareness of theoretical issues are present, but that theory-driven research is, as yet, not a major issue in empirical software engineering.
Abstract: Empirically based theories are generally perceived as foundational to science. However, in many disciplines, the nature, role and even the necessity of theories remain matters for debate, particularly in young or practical disciplines such as software engineering. This article reports a systematic review of the explicit use of theory in a comprehensive set of 103 articles reporting experiments, from of a total of 5,453 articles published in major software engineering journals and conferences in the decade 1993-2002. Of the 103 articles, 24 use a total of 40 theories in various ways to explain the cause-effect relationship(s) under investigation. The majority of these use theory in the experimental design to justify research questions and hypotheses, some use theory to provide post hoc explanations of their results, and a few test or modify theory. A third of the theories are proposed by authors of the reviewed articles. The interdisciplinary nature of the theories used is greater than that of research in software engineering in general. We found that theory use and awareness of theoretical issues are present, but that theory-driven research is, as yet, not a major issue in empirical software engineering. Several articles comment explicitly on the lack of relevant theory. We call for an increased awareness of the potential benefits of involving theory, when feasible. To support software engineering researchers who wish to use theory, we show which of the reviewed articles on which topics use which theories for what purposes, as well as details of the theories' characteristics

Journal ArticleDOI
TL;DR: A comprehensive study of an implementation of the Smalltalk object oriented system, one of the first and purest object-oriented programming environment, searching for scaling laws in its properties, systematically found Pareto - or sometimes log-normal - distributions in these properties.
Abstract: We present a comprehensive study of an implementation of the Smalltalk object oriented system, one of the first and purest object-oriented programming environment, searching for scaling laws in its properties. We study ten system properties, including the distributions of variable and method names, inheritance hierarchies, class and method sizes, system architecture graph. We systematically found Pareto - or sometimes log-normal - distributions in these properties. This denotes that the programming activity, even when modeled from a statistical perspective, can in no way be simply modeled as a random addition of independent increments with finite variance, but exhibits strong organic dependencies on what has been already developed. We compare our results with similar ones obtained for large Java systems, reported in the literature or computed by ourselves for those properties never studied before, showing that the behavior found is similar in all studied object oriented systems. We show how the Yule process is able to stochastically model the generation of several of the power-laws found, identifying the process parameters and comparing theoretical and empirical tail indexes. Lastly, we discuss how the distributions found are related to existing object-oriented metrics, like Chidamber and Kemerer's, and how they could provide a starting point for measuring the quality of a whole system, versus that of single classes. In fact, the usual evaluation of systems based on mean and standard deviation of metrics can be misleading. It is more interesting to measure differences in the shape and coefficients of the data?s statistical distributions.

Journal ArticleDOI
TL;DR: It is shown that vulnerability announcements lead to a negative and significant change in a software vendor's market value, which is more negative if the vendor fails to provide a patch at the time of disclosure.
Abstract: Security defects in software cost millions of dollars to firms in terms of downtime, disruptions, and confidentiality breaches. However, the economic implications of these defects for software vendors are not well understood. Lack of legal liability and the presence of switching costs and network externalities may protect software vendors from incurring significant costs in the event of a vulnerability announcement, unlike such industries as auto and pharmaceuticals, which have been known to suffer significant loss in market value in the event of a defect announcement. Although research in software economics has studied firms' incentives to improve overall quality, there have not been any studies which show that software vendors have an incentive to invest in building more secure software. The objectives of this paper are twofold. 1) We examine how a software vendor's market value changes when a vulnerability is announced. 2) We examine how firm and vulnerability characteristics mediate the change in the market value of a vendor. We collect data from leading national newspapers and industry sources, such as the Computer Emergency Response Team (CERT), by searching for reports on published software vulnerabilities. We show that vulnerability announcements lead to a negative and significant change in a software vendor's market value. In our sample, on average, a vendor loses around 0.6 percent value in stock price when a vulnerability is reported. We find that a software vendor loses more market share if the market is competitive or if the vendor is small. To provide further insight, we use the information content of the disclosure announcement to classify vulnerabilities into various types. We find that the change in stock price is more negative if the vendor fails to provide a patch at the time of disclosure. Also, more severe flaws have a significantly greater impact. Our analysis provides many interesting implications for software vendors as well as policy makers.

Journal ArticleDOI
TL;DR: The results in this paper indicate that some of the biggest rewards from high levels of process maturity come from the reduction in variance of software development outcomes that were caused by factors other than software size.
Abstract: The Capability Maturity Model (CMM) has become a popular methodology for improving software development processes with the goal of developing high-quality software within budget and planned cycle time. Prior research literature, while not exclusively focusing on CMM level 5 projects, has identified a host of factors as determinants of software development effort, quality, and cycle time. In this study, we focus exclusively on CMM level 5 projects from multiple organizations to study the impacts of highly mature processes on effort, quality, and cycle time. Using a linear regression model based on data collected from 37 CMM level 5 projects of four organizations, we find that high levels of process maturity, as indicated by CMM level 5 rating, reduce the effects of most factors that were previously believed to impact software development effort, quality, and cycle time. The only factor found to be significant in determining effort, cycle time, and quality was software size. On the average, the developed models predicted effort and cycle time around 12 percent and defects to about 49 percent of the actuals, across organizations. Overall, the results in this paper indicate that some of the biggest rewards from high levels of process maturity come from the reduction in variance of software development outcomes that were caused by factors other than software size

Journal ArticleDOI
TL;DR: A Bayesian network model is built to relate object-oriented software metrics to software fault content and fault proneness and is shown to be able to represent linear, Poisson, and binomial logistic regression.
Abstract: We present a methodology for Bayesian analysis of software quality. We cast our research in the broader context of constructing a causal framework that can include process, product, and other diverse sources of information regarding fault introduction during the software development process. In this paper, we discuss the aspect of relating internal product metrics to external quality metrics. Specifically, we build a Bayesian network (BN) model to relate object-oriented software metrics to software fault content and fault proneness. Assuming that the relationship can be described as a generalized linear model, we derive parametric functional forms for the target node conditional distributions in the BN. These functional forms are shown to be able to represent linear, Poisson, and binomial logistic regression. The models are empirically evaluated using a public domain data set from a software subsystem. The results show that our approach produces statistically significant estimations and that our overall modeling method performs no worse than existing techniques.

Journal ArticleDOI
TL;DR: In this correspondence, it is pointed out a discrepancy in a recent paper, " data mining static code attributes to learn defect predictors," that was published in this journal.
Abstract: In this correspondence, we point out a discrepancy in a recent paper, "data mining static code attributes to learn defect predictors," that was published in this journal. Because of the small percentage of defective modules, using probability of detection (pd) and probability of false alarm (pf) as accuracy measures may lead to impractical prediction models.

Journal ArticleDOI
TL;DR: This work presents a new approach for test suite reduction that attempts to use additional coverage information of test cases to selectively keep some additional test cases in the reduced suites that are redundant with respect to the testing criteria used for suite minimization, with the goal of improving the FDE retention of the reduction suites.
Abstract: Software testing is a critical part of software development. As new test cases are generated over time due to software modifications, test suite sizes may grow significantly. Because of time and resource constraints for testing, test suite minimization techniques are needed to remove those test cases from a suite that, due to code modifications over time, have become redundant with respect to the coverage of testing requirements for which they were generated. Prior work has shown that test suite minimization with respect to a given testing criterion can significantly diminish the fault detection effectiveness (FDE) of suites. We present a new approach for test suite reduction that attempts to use additional coverage information of test cases to selectively keep some additional test cases in the reduced suites that are redundant with respect to the testing criteria used for suite minimization, with the goal of improving the FDE retention of the reduced suites. We implemented our approach by modifying an existing heuristic for test suite minimization. Our experiments show that our approach can significantly improve the FDE of reduced test suites without severely affecting the extent of suite size reduction

Journal ArticleDOI
TL;DR: The approach to tackle the API-evolution problem in the context of reuse-based software development is discussed, which automatically recognizes the API changes of the reused framework and proposes plausible replacements to the "obsolete" API based on working examples of the framework code base.
Abstract: Applications built on reusable component frameworks are subject to two independent, and potentially conflicting, evolution processes. The application evolves in response to the specific requirements and desired qualities of the application's stakeholders. On the other hand, the evolution of the component framework is driven by the need to improve the framework functionality and quality while maintaining its generality. Thus, changes to the component framework frequently change its API on which its client applications rely and, as a result, these applications break. To date, there has been some work aimed at supporting the migration of client applications to newer versions of their underlying frameworks, but it usually requires that the framework developers do additional work for that purpose or that the application developers use the same tools as the framework developers. In this paper, we discuss our approach to tackle the API-evolution problem in the context of reuse-based software development, which automatically recognizes the API changes of the reused framework and proposes plausible replacements to the "obsolete" API based on working examples of the framework code base. This approach has been implemented in the Diff-CatchUp tool. We report on two case studies that we have conducted to evaluate the effectiveness of our approach with its Diff-CatchUp prototype.

Journal ArticleDOI
TL;DR: This work proposes an approach based on developing specific guidelines that capitalize upon key elements recurrently intervening in the usability features elicitation and specification process that provides requirements analysts with a knowledge repository.
Abstract: Like any other quality attribute, usability imposes specific constraints on software components. Features that raise the software system's usability have to be considered from the earliest development stages. But, discovering and documenting usability features is likely to be beyond the usability knowledge of most requirements engineers, developers, and users. We propose an approach based on developing specific guidelines that capitalize upon key elements recurrently intervening in the usability features elicitation and specification process. The use of these guidelines provides requirements analysts with a knowledge repository. They can use this repository to ask the right questions and capture precise usability requirements information.

Journal ArticleDOI
TL;DR: It is shown that tranquillity is easier to obtain and less disruptive for the running application but still a sufficient condition to ensure application consistency as well as an implementation on a component middleware platform.
Abstract: This paper revisits a problem that was identified by Kramer and Magee: placing a system in a consistent state before and after runtime changes We show that their notion of quiescence as a necessary and sufficient condition for safe runtime changes is too strict and results in a significant disruption in the application being updated In this paper, we introduce a weaker condition: tranquillity We show that tranquillity is easier to obtain and less disruptive for the running application but still a sufficient condition to ensure application consistency We present an implementation of our approach on a component middleware platform and experimentally verify the validity and practical applicability of our approach using data retrieved from a case study

Journal ArticleDOI
TL;DR: A literal replication of Fenton and Ohlsson's study on fault distributions confirms that the uneven distribution of defects motivates uneven Distribution of quality assurance efforts, although predictors for such distribution of efforts are not sufficiently precise.
Abstract: To contribute to the body of empirical research on fault distributions during development of complex software systems, a replication of a study of Fenton and Ohlsson is conducted. The hypotheses from the original study are investigated using data taken from an environment that differs in terms of system size, project duration, and programming language. We have investigated four sets of hypotheses on data from three successive telecommunications projects: 1) the Pareto principle, that is, a small number of modules contain a majority of the faults (in the replication, the Pareto principle is confirmed), 2) fault persistence between test phases (a high fault incidence in function testing is shown to imply the same in system testing, as well as prerelease versus postrelease fault incidence), 3) the relation between number of faults and lines of code (the size relation from the original study could be neither confirmed nor disproved in the replication), and 4) fault density similarities across test phases and projects (in the replication study, fault densities are confirmed to be similar across projects). Through this replication study, we have contributed to what is known on fault distributions, which seem to be stable across environments.

Journal ArticleDOI
TL;DR: The extensions discussed here require only relatively small changes in the SPIN source code and are compatible with most existing verification modes such as partial order reduction, the verification of temporal logic formulas, bitstate hashing, and hash-compact compression.
Abstract: We describe an extension of the SPIN model checker for use on multicore shared-memory systems and report on its performance. We show how, with proper load balancing, the time requirements of a verification run can, in some cases, be reduced close to N-fold when N processing cores are used. We also analyze the types of verification problems for which multicore algorithms cannot provide relief. The extensions discussed here require only relatively small changes in the SPIN source code and are compatible with most existing verification modes such as partial order reduction, the verification of temporal logic formulas, bitstate hashing, and hash-compact compression.

Journal ArticleDOI
TL;DR: The incremental test suite update algorithm and experimental study indicates that concept analysis provides a promising means for incrementally updating reduced test suites in response to newly captured user sessions with little loss in fault detection capability and program coverage.
Abstract: The continuous use of the Web for daily operations by businesses, consumers, and the government has created a great demand for reliable Web applications. One promising approach to testing the functionality of Web applications leverages the user-session data collected by Web servers. User-session-based testing automatically generates test cases based on real user profiles. The key contribution of this paper is the application of concept analysis for clustering user sessions and a set of heuristics for test case selection. Existing incremental concept analysis algorithms are exploited to avoid collecting and maintaining large user-session data sets and to thus provide scalability. We have completely automated the process from user session collection and test suite reduction through test case replay. Our incremental test suite update algorithm, coupled with our experimental study, indicates that concept analysis provides a promising means for incrementally updating reduced test suites in response to newly captured user sessions with little loss in fault detection capability and program coverage.

Journal ArticleDOI
TL;DR: This work clarifies the structural deficiencies encapsulated in test smells by formalizing core test concepts and their characteristics and proposes a set of metrics defined in terms of unit test concepts to support the detection of two test smells.
Abstract: As a fine-grained defect detection technique, unit testing introduces a strong dependency on the structure of the code. Accordingly, test coevolution forms an additional burden on the software developer which can be tempered by writing tests in a manner that makes them easier to change. Fortunately, we are able to concretely express what a good test is by exploiting the specific principles underlying unit testing. Analogous to the concept of code smells, violations of these principles are termed test smells. In this paper, we clarify the structural deficiencies encapsulated in test smells by formalizing core test concepts and their characteristics. To support the detection of two such test smells, General Fixture and Eager Test, we propose a set of metrics defined in terms of unit test concepts. We compare their detection effectiveness using manual inspection and through a comparison with human reviewing. Although the latter is the traditional means for test quality assurance, our results indicate it is not a reliable means for test smell detection. This work thus stresses the need for a more reliable detection mechanism and provides an initial contribution through the validation of test smell metrics.

Journal ArticleDOI
TL;DR: A new set of metrics that measure the quality of modularization of a non-object-oriented software system are presented and proposed, based on information-theoretic principles.
Abstract: We present in this paper a new set of metrics that measure the quality of modularization of a non-object-oriented software system. We have proposed a set of design principles to capture the notion of modularity and defined metrics centered around these principles. These metrics characterize the software from a variety of perspectives: structural, architectural, and notions such as the similarity of purpose and commonality of goals. (By structural, we are referring to intermodule coupling-based notions, and by architectural, we mean the horizontal layering of modules in large software systems.) We employ the notion of API (application programming interface) as the basis for our structural metrics. The rest of the metrics we present are in support of those that are based on API. Some of the important support metrics include those that characterize each module on the basis of the similarity of purpose of the services offered by the module. These metrics are based on information-theoretic principles. We tested our metrics on some popular open-source systems and some large legacy-code business applications. To validate the metrics, we compared the results obtained on human-modularized versions of the software (as created by the developers of the software) with those obtained on randomized versions of the code. For randomized versions, the assignment of the individual functions to modules was randomized