scispace - formally typeset
Search or ask a question

Showing papers by "Thomas J. Ostrand published in 2008"


Journal ArticleDOI
TL;DR: It is concluded that while many factors can “spoil the broth” (lead to the release of software with too many defects), the number of developers is not a major influence.
Abstract: Fault prediction by negative binomial regression models is shown to be effective for four large production software systems from industry. A model developed originally with data from systems with regularly scheduled releases was successfully adapted to a system without releases to identify 20% of that system's files that contained 75% of the faults. A model with a pre-specified set of variables derived from earlier research was applied to three additional systems, and proved capable of identifying averages of 81, 94 and 76% of the faults in those systems. A primary focus of this paper is to investigate the impact on predictive accuracy of using data about the number of developers who access individual code units. For each system, including the cumulative number of developers who had previously modified a file yielded no more than a modest improvement in predictive accuracy. We conclude that while many factors can "spoil the broth" (lead to the release of software with too many defects), the number of developers is not a major influence.

173 citations


Patent
11 Jun 2008
TL;DR: In this paper, the authors proposed a method for predicting the faultproneness of code units (files, modules, packages, and the like) of large-scale, long-lived software systems.
Abstract: A method, apparatus, and computer-readable medium for predicting the fault-proneness of code units (files, modules, packages, and the like) of large-scale, long-lived software systems. The method collects information about the code units and the development process from previous releases, and formats this information for input to an analysis stage. The tool then performs a statistical regression analysis on the collected data, and formulates a model to predict fault counts for code units of the current and future releases. Finally, the method computes an expected fault count for each code unit in the current release by applying the formulated model to data from the current release. The expected fault counts are used to rank the release units in descending order of fault-proneness so that debugging efforts and resources can be optimized.

22 citations


Proceedings ArticleDOI
12 May 2008
TL;DR: Two different software fault prediction models have been used to predict the N% of the files of a large software system that are likely to contain the largest numbers of faults, and their effectiveness on three large industrial software systems is compared.
Abstract: Two different software fault prediction models have been used to predict the N% of the files of a large software system that are likely to contain the largest numbers of faults. We used the same predictor variables in a negative binomial regression model and a recursive partitioning model, and compared their effectiveness on three large industrial software systems. The negative binomial model identified files that contain 76 to 93 percent of the faults, and recursive partitioning identified files that contain 68 to 85 percent.

13 citations


Proceedings Article
12 May 2008
TL;DR: The call for papers attracted submissions from Asia, Canada, Europe, and the United States and the program committee accepted over a dozen papers covering a variety of topics, including models related to fault prediction, effort estimation, and requirements engineering.
Abstract: It is our great pleasure to welcome you to PROMISE 2008 - the 4th International Workshop on Predictor Models in Software Engineering. This year's workshop continues its tradition of being the premier forum for presentation of research results and experience reports in the area of predictor models applied to software engineering. The theme for this year's workshop is Bridging Research and Industry. Key questions for the workshop are -- How might PROMISE and other researchers better align with the realities of industry? -- How can industry make effective use of research ideas? In keeping with this theme, we have two keynote speakers from industry. Dr. Murray Cantor is an IBM Distinguished Engineer and the governance solutions lead on the IBM Rational Software CTO team. Mr. Chris Beal is a Sun Microsystems Senior Staff Engineer working with Solaris Revenue Product Engineering. We are pleased that PROMISE has become an international event. Our call for papers attracted submissions from Asia, Canada, Europe, and the United States. The program committee accepted over a dozen papers covering a variety of topics, including models related to fault prediction, effort estimation, and requirements engineering. One feature that sets this workshop apart from others is the PROMISE repository of software data sets that are publicly available for research purposes. The repository currently has 57 data sets and has grown at an average rate of 44% annually over the last 3.5 years.

12 citations


Book ChapterDOI
09 Apr 2008
TL;DR: Positive binomial regression models are developed and used to predict the expected number of faults in each file of the next release of a system, based on code characteristics and fault and modification history data.
Abstract: It would obviously be very valuable to know in advance which files in the next release of a large software system are most likely to contain the largest numbers of faults. This is true whether the goal is to validate the system by testing or formally verifying it, or by using some hybrid approach. To accomplish this, we developed negative binomial regression models and used them to predict the expected number of faults in each file of the next release of a system. The predictions are based on code characteristics and fault and modification history data. This paper discusses what we have learned from applying the model to several large industrial systems, each with multiple years of field exposure. It also discusses our success in making accurate predictions and some of the issues that had to be considered.

8 citations


Proceedings ArticleDOI
20 Jul 2008
TL;DR: A comparison of two methods for classifying change reports for a large software system is described, and it is concluded that the stage of development when the report was initialized is a more accurate indicator of its fault status than the presence of certain keywords in the report's natural language description.
Abstract: A key problem when doing automated fault analysis and fault prediction from information in a software change management database is how to determine which change reports represent software faults. In some change management systems, there is no simple way to distinguish fault reports from changes made to add new functionality or perform routine maintenance. This paper describes a comparison of two methods for classifying change reports for a large software system, and concludes that, for that particular system, the stage of development when the report was initialized is a more accurate indicator of its fault status than the presence of certain keywords in the report's natural language description.

4 citations


Proceedings ArticleDOI
10 May 2008
TL;DR: The PROMISE workshop seeks to deliver to the software engineering community useful, usable, verifiable, and repeatable models, and to allow researchers to conduct repeatable software engineering experiments, the PROMise repository is maintained.
Abstract: The PROMISE workshop seeks to deliver to the software engineering community useful, usable, verifiable, and repeatable models. To provide a sound and realistic basis for creating predictive models, and to allow researchers to conduct repeatable software engineering experiments, we maintain the PROMISE repository, a growing collection that now contains 57 empirically-based data sets.

3 citations