The article was published on 2012-01-01 and is currently open access. It has received 23 citations till now. The article focuses on the topics: Data quality & Quality (business).
TL;DR: This paper finds that: 1) within-company predictors are weak for small data-sets; 2) the Peters filter+cross-company builds better predictors than both within- company and the Burak filter+Cross-company; and 3) the PETS filter builds 64% more useful predictor than bothWithin-company and theBurak filter-cross- company approaches.
TL;DR: This review aims to help with the understanding of various elements associated with fault prediction process and to explore various issues involved in the software fault prediction.
TL;DR: This paper uses data on 154 projects from 2 sources to investigate transfer learning between different time intervals and 195 projects from 51 sources to provide evidence on the value of transfer learning for traditional cross-company learning problems.
TL;DR: Some of the relationships between strands of closely related work in Probabilistic reasoning and machine learning for Software Engineering are explored, arguing that they have much in common and some future challenges in the area of AI for SE are set out.
TL;DR: The Hoeffding Tree approach is shown to produce a more stable and robust model than traditional data mining approaches and shows promise in terms of informing software development activities in order to minimize the chance of failure.
TL;DR: This work examines data from two major open source projects, the Apache web server and the Mozilla browser, and quantifies aspects of developer participation, core team size, code ownership, productivity, defect density, and problem resolution intervals for these OSS projects.
TL;DR: The effects of sample size on feature selection and error estimation for several types of classifiers are discussed and an emphasis is placed on giving practical advice to designers and users of statistical pattern recognition systems.
TL;DR: The infrastructure that is being designed and constructed to support controlled experimentation with testing and regression testing techniques is described and the impact that this infrastructure has had and can be expected to have.
TL;DR: It is argued that software engineering is ideal for the application of metaheuristic search techniques, such as genetic algorithms, simulated annealing and tabu search, which could provide solutions to the difficult problems of balancing competing competing constraints.
Q1. What contributions have the authors mentioned in the paper "On software engineering repositories and their open problems" ?
In this paper the authors present a survey of the publicly available repositories and classify the most common ones as well as discussing the problems faced by researchers when applying machine learning or statistical techniques to them.
Q2. What are the main problems in software engineering?
Although some of the problems such as outliers or noise have been extensively studied in software engineering, others need futher research, in particular, imbalance and data shifting from the machine learning point of view and replicability in general, providing not only the data but also the tools to replicate the empirical work.
Q3. What is the common practice in data mining?
It is customary in data mining, to preform the evaluation using cross-validation, i.e., divide the dataset into k-folds for training and testing and report the averages of the k folds.
Q4. What are the main problems of the ISBSG?
Some of the repositories such as the ISBSG, are composed of a large number of attributes, however, many of those attributes are missing values that needto be discarded in order to apply machine learning algorithms.
Q5. What are the main authors of the paper?
A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of Open Source software development: Apache and Mozilla,” ACM Transactions on Software Engineering and Methodology, vol. 11, no.
Q6. Who is the author of this article?
A. Fernández, S. Garcı́a, and F. Herrera, “Addressing the classification with imbalanced data: Open problems and new challenges on class distribution,” in 6th International Conference on Hybrid Artificial Intelligence Systems (HAIS), 2011, pp. 1–10.[32] R. Lincke, J. Lundberg, and W. Löwe, “Comparing software metrics tools,” in Proceedings of the 2008 International Symposium on Software Testing and Analysis (ISSTA’08).