Author
Thomas J. Ostrand
Other affiliations: AT&T Labs, AT&T, Princeton University
Bio: Thomas J. Ostrand is an academic researcher from Mälardalen University College. The author has contributed to research in topics: Software system & Test case. The author has an hindex of 25, co-authored 55 publications receiving 4745 citations. Previous affiliations of Thomas J. Ostrand include AT&T Labs & AT&T.
Topics: Software system, Test case, Test suite, Software, Software development
Papers published on a yearly basis
Papers
More filters
••
21 May 1994TL;DR: An experimental study investigating the effectiveness of two code-based test adequacy criteria for identifying sets of test cases that detect faults found that tests based respectively on control-flow and dataflow criteria are frequency complementary in their effectiveness.
Abstract: This paper reports an experimental study investigating the effectiveness of two code-based test adequacy criteria for identifying sets of test cases that detect faults. The all-edges and all-DUs (modified all-uses) coverage criteria were applied to 130 faulty program versions derived from seven moderate size base programs by seeding realistic faults. We generated several thousand test sets for each faulty program and examined the relationship between fault detection and coverage. Within the limited domain of our experiments, test sets achieving coverage levels over 90% usually showed significantly better fault detection than randomly chosen test sets of the same size. In addition, significant improvements in the effectiveness of coverage-based tests usually occurred as coverage increased from 90% to 100%. However the results also indicate that 100% code coverage alone is not a reliable indicator of the effectiveness of a test set. We also found that tests based respectively on control-flow and dataflow criteria are frequency complementary in their effectiveness. >
922 citations
••
TL;DR: A method for creating functional test suites has been developed in which a test engineer analyzes the system specification, writes a series of formal test specifications, and then uses a generator tool to produce test descriptions from which test scripts are written.
Abstract: A method for creating functional test suites has been developed in which a test engineer analyzes the system specification, writes a series of formal test specifications, and then uses a generator tool to produce test descriptions from which test scripts are written. The advantages of this method are that the tester can easily modify the test specification when necessary, and can control the complexity and number of the tests by annotating the tests specification with constraints.
827 citations
••
TL;DR: A negative binomial regression model has been developed and used to predict the expected number of faults in each file of the next release of a system, based on the code of the file in the current release, and fault and modification history of thefile from previous releases.
Abstract: Advance knowledge of which files in the next release of a large software system are most likely to contain the largest numbers of faults can be a very valuable asset. To accomplish this, a negative binomial regression model has been developed and used to predict the expected number of faults in each file of the next release of a system. The predictions are based on the code of the file in the current release, and fault and modification history of the file from previous releases. The model has been applied to two large industrial systems, one with a history of 17 consecutive quarterly releases over 4 years, and the other with nine releases over 2 years. The predictions were quite accurate: for each release of the two systems, the 20 percent of the files with the highest predicted number of faults contained between 71 percent and 92 percent of the faults that were actually detected, with the overall average being 83 percent. The same model was also used to predict which files of the first system were likely to have the highest fault densities (faults per KLOC). In this case, the 20 percent of the files with the highest predicted fault densities contained an average of 62 percent of the system's detected faults. However, the identified files contained a much smaller percentage of the code mass than the files selected to maximize the numbers of faults. The model was also used to make predictions from a much smaller input set that only contained fault data from integration testing and later. The prediction was again very accurate, identifying files that contained from 71 percent to 93 percent of the faults, with the average being 84 percent. Finally, a highly simplified version of the predictor selected files containing, on average, 73 percent and 74 percent of the faults for the two systems.
704 citations
••
[...]
TL;DR: A negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system, and was extremely accurate.
Abstract: The ability to predict which files in a large software system are most likely to contain the largest numbers of faults in the next release can be a very valuable asset. To accomplish this, a negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system. The files of each release were sorted in descending order based on the predicted number of faults and then the first 20% of the files were selected. This was done for each of fifteen consecutive releases, representing more than four years of field usage. The predictions were extremely accurate, correctly selecting files that contained between 71% and 92% of the faults, with the overall average being 83%. In addition, the same model was used on data for the same system's releases, but with all fault data prior to integration testing removed. The prediction was again very accurate, ranging from 71% to 93%, with the average being 84%. Predictions were made for a second system, and again the first 20% of files accounted for 83% of the identified faults. Finally, a highly simplified predictor was considered which correctly predicted 73% and 74% of the faults for the two systems.
303 citations
••
01 Jul 2002TL;DR: The ultimate goal of this study is to help identify characteristics of files that can be used as predictors of fault-proneness, thereby helping organizations determine how best to use their testing resources.
Abstract: A case study is presented using thirteen releases of a large industrial inventory tracking system. Several types of questions are addressed in this study. The first involved examining how faults are distributed over the different files. This included making a distinction between the release during which they were discovered, the lifecycle stage at which they were first detected, and the severity of the fault. The second category of questions we considered involved studying how the size of modules affected their fault density. This included looking at questions like whether or not files with high fault densities at early stages of the lifecycle also had high fault densities during later stages. A third type of question we considered was whether files that contained large numbers of faults during early stages of development, also had large numbers of faults during later stages, and whether faultiness persisted from release to release. Finally, we examined whether newly written files were more fault-prone than ones that were written for earlier releases of the product. The ultimate goal of this study is to help identify characteristics of files that can be used as predictors of fault-proneness, thereby helping organizations determine how best to use their testing resources.
275 citations
Cited by
More filters
••
TL;DR: The notion of adequacy criteria is examined together with its role in software dynamic testing and the methods for comparison and assessment of criteria are reviewed.
Abstract: Objective measurement of test quality is one of the key issues in software testing. It has been a major research focus for the last two decades. Many test criteria have been proposed and studied for this purpose. Various kinds of rationales have been presented in support of one criterion or another. We survey the research work in this area. The notion of adequacy criteria is examined together with its role in software dynamic testing. A review of criteria classification is followed by a summary of the methods for comparison and assessment of criteria.
1,287 citations
••
TL;DR: This paper surveys each area of minimization, selection and prioritization technique and discusses open problems and potential directions for future research.
Abstract: Regression testing is a testing activity that is performed to provide confidence that changes do not harm the existing behaviour of the software. Test suites tend to grow in size as software evolves, often making it too costly to execute entire test suites. A number of different approaches have been studied to maximize the value of the accrued test suite: minimization, selection and prioritization. Test suite minimization seeks to eliminate redundant test cases in order to reduce the number of tests to run. Test case selection seeks to identify the test cases that are relevant to some set of recent changes. Test case prioritization seeks to order test cases in such a way that early fault detection is maximized. This paper surveys each area of minimization, selection and prioritization technique and discusses open problems and potential directions for future research. Copyright © 2010 John Wiley & Sons, Ltd.
1,276 citations
••
16 May 1999TL;DR: This paper describes techniques for dynamically discovering invariants, along with an instrumenter and an inference engine that embody these techniques, and reports on the application of the engine to two sets of target programs.
Abstract: Explicitly stated program invariants can help programmers by identifying program properties that must be preserved when modifying code. In practice, however, these invariants are usually implicit. An alternative to expecting programmers to fully annotate code with invariants is to automatically infer likely invariants from the program itself. This research focuses on dynamic techniques for discovering invariants from execution traces. This article reports three results. First, it describes techniques for dynamically discovering invariants, along with an implementation, named Daikon, that embodies these techniques. Second, it reports on the application of Daikon to two sets of target programs. In programs from Gries's work (1981) on program derivation, the system rediscovered predefined invariants. In a C program lacking explicit invariants, the system discovered invariants that assisted a software evolution task. These experiments demonstrate that, at least for small programs, invariant inference is both accurate and useful. Third, it analyzes scalability issues, such as invariant detection runtime and accuracy, as functions of test suites and program points instrumented.
1,219 citations
••
TL;DR: Test case prioritization techniques schedule test cases for execution in an order that attempts to increase their effectiveness at meeting some performance goal as discussed by the authors, such as rate of fault detection, a measure of how quickly faults are detected within the testing process.
Abstract: Test case prioritization techniques schedule test cases for execution in an order that attempts to increase their effectiveness at meeting some performance goal. Various goals are possible; one involves rate of fault detection, a measure of how quickly faults are detected within the testing process. An improved rate of fault detection during testing can provide faster feedback on the system under test and let software engineers begin correcting faults earlier than might otherwise be possible. One application of prioritization techniques involves regression testing, the retesting of software following modifications; in this context, prioritization techniques can take advantage of information gathered about the previous execution of test cases to obtain test case orderings. We describe several techniques for using test execution information to prioritize test cases for regression testing, including: 1) techniques that order test cases based on their total coverage of code components; 2) techniques that order test cases based on their coverage of code components not previously covered; and 3) techniques that order test cases based on their estimated ability to reveal faults in the code components that they cover. We report the results of several experiments in which we applied these techniques to various test suites for various programs and measured the rates of fault detection achieved by the prioritized test suites, comparing those rates to the rates achieved by untreated, randomly ordered, and optimally ordered suites.
1,200 citations
••
07 Nov 2005TL;DR: The studies show that, on the same set of subjects, the Tarantula technique consistently outperforms the other four techniques in terms of effectiveness in fault localization, and is comparable in efficiency to the least expensive of the other five techniques.
Abstract: The high cost of locating faults in programs has motivated the development of techniques that assist in fault localization by automating part of the process of searching for faults. Empirical studies that compare these techniques have reported the relative effectiveness of four existing techniques on a set of subjects. These studies compare the rankings that the techniques compute for statements in the subject programs and the effectiveness of these rankings in locating the faults. However, it is unknown how these four techniques compare with Tarantula, another existing fault-localization technique, although this technique also provides a way to rank statements in terms of their suspiciousness. Thus, we performed a study to compare the Tarantula technique with the four techniques previously compared. This paper presents our study---it overviews the Tarantula technique along with the four other techniques studied, describes our experiment, and reports and discusses the results. Our studies show that, on the same set of subjects, the Tarantula technique consistently outperforms the other four techniques in terms of effectiveness in fault localization, and is comparable in efficiency to the least expensive of the other four techniques.
1,142 citations