scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Detecting code smells using machine learning techniques: Are we there yet?

TL;DR: The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.
Abstract: Code smells are symptoms of poor design and implementation choices weighing heavily on the quality of produced source code. During the last decades several code smell detection tools have been proposed. However, the literature shows that the results of these tools can be subjective and are intrinsically tied to the nature and approach of the detection. In a recent work the use of Machine-Learning (ML) techniques for code smell detection has been proposed, possibly solving the issue of tool subjectivity giving to a learner the ability to discern between smelly and non-smelly source code elements. While this work opened a new perspective for code smell detection, it only considered the case where instances affected by a single type smell are contained in each dataset used to train and test the machine learners. In this work we replicate the study with a different dataset configuration containing instances of more than one type of smell. The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.
Citations
More filters
01 Jan 2011
TL;DR: In this paper, a probabilistic model is proposed to detect occurrences of the Blob antipattern in code and design smells in programs, which can be calibrated using machine learning techniques to offer an improved, context-specific detection.
Abstract: The presence of code and design smells can have a severe impact on the quality of a program. Consequently, their detection and correction have drawn the attention of both researchers and practitioners who have proposed various approaches to detect code and design smells in programs. However, none of these approaches handle the inherent uncertainty of the detection process. We propose a Bayesian approach to manage this uncertainty. First, we present a systematic process to convert existing state-of-the-art detection rules into a probabilistic model. We illustrate this process by generating a model to detect occurrences of the Blob antipattern. Second, we present results of the validation of the model: we built this model on two open-source programs, GanttProject v1.10.2 and Xerces v2.7.0, and measured its accuracy. Third, we compare our model with another approach to show that it returns the same candidate classes while ordering them to minimise the quality analysts' effort. Finally, we show that when past detection results are available, our model can be calibrated using machine learning techniques to offer an improved, context-specific detection.

165 citations

Journal ArticleDOI
TL;DR: There is still room for the improvement of machine learning techniques in the context of code smell detection and it is argued that JRip and Random Forest are the most effective classifiers in terms of performance.
Abstract: Background: Code smells indicate suboptimal design or implementation choices in the source code that often lead it to be more change- and fault-prone. Researchers defined dozens of code smell detectors, which exploit different sources of information to support developers when diagnosing design flaws. Despite their good accuracy, previous work pointed out three important limitations that might preclude the use of code smell detectors in practice: (i) subjectiveness of developers with respect to code smells detected by such tools, (ii) scarce agreement between different detectors, and (iii) difficulties in finding good thresholds to be used for detection. To overcome these limitations, the use of machine learning techniques represents an ever increasing research area. Objective: While the research community carefully studied the methodologies applied by researchers when defining heuristic-based code smell detectors, there is still a noticeable lack of knowledge on how machine learning approaches have been adopted for code smell detection and whether there are points of improvement to allow a better detection of code smells. Our goal is to provide an overview and discuss the usage of machine learning approaches in the field of code smells. Method: This paper presents a Systematic Literature Review (SLR) on Machine Learning Techniques for Code Smell Detection. Our work considers papers published between 2000 and 2017. Starting from an initial set of 2456 papers, we found that 15 of them actually adopted machine learning approaches. We studied them under four different perspectives: (i) code smells considered, (ii) setup of machine learning approaches, (iii) design of the evaluation strategies, and (iv) a meta-analysis on the performance achieved by the models proposed so far. Results: The analyses performed show that God Class, Long Method, Functional Decomposition, and Spaghetti Code have been heavily considered in the literature. Decision Trees and Support Vector Machines are the most commonly used machine learning algorithms for code smell detection. Models based on a large set of independent variables have performed well. JRip and Random Forest are the most effective classifiers in terms of performance. The analyses also reveal the existence of several open issues and challenges that the research community should focus on in the future. Conclusion: Based on our findings, we argue that there is still room for the improvement of machine learning techniques in the context of code smell detection. The open issues emerged in this study can represent the input for researchers interested in developing more powerful techniques.

148 citations


Cites background from "Detecting code smells using machine..."

  • ...As shown in a recent work [97], the dataset might influence the performance of machine learning models....

    [...]

Journal ArticleDOI
TL;DR: A mixed-methods empirical study of 117 releases from 9 open-source systems finds that community-related factors contribute to the intensity of code smells, supporting the joint use of community and code smells detection as a mechanism for the joint management of technical and social problems around software development communities.
Abstract: Code smells are poor implementation choices applied by developers during software evolution that often lead to critical flaws or failure. Much in the same way, community smells reflect the presence of organizational and socio-technical issues within a software community that may lead to additional project costs. Recent empirical studies provide evidence that community smells are often—if not always—connected to circumstances such as code smells. In this paper we look deeper into this connection by conducting a mixed-methods empirical study of 117 releases from 9 open-source systems. The qualitative and quantitative sides of our mixed-methods study were run in parallel and assume a mutually-confirmative connotation. On the one hand, we survey 162 developers of the 9 considered systems to investigate whether developers perceive relationship between community smells and the code smells found in those projects. On the other hand, we perform a fine-grained analysis into the 117 releases of our dataset to measure the extent to which community smells impact code smell intensity (i.e., criticality). We then propose a code smell intensity prediction model that relies on both technical and community-related aspects. The results of both sides of our mixed-methods study lead to one conclusion: community-related factors contribute to the intensity of code smells. This conclusion supports the joint use of community and code smells detection as a mechanism for the joint management of technical and social problems around software development communities.

87 citations

Proceedings ArticleDOI
25 May 2019
TL;DR: A large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection, and considers five code smell types and compares machine learning models with DECOR, a state-of-the-art heuristics-based approach.
Abstract: Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehensibility has been widely shown in the past and several techniques to automatically detect them have been devised. Most of these techniques are based on heuristics, namely they compute a set of code metrics and combine them by creating detection rules; while they have a reasonable accuracy, a recent trend is represented by the use of machine learning where code metrics are used as predictors of the smelliness of code artefacts. Despite the recent advances in the field, there is still a noticeable lack of knowledge of whether machine learning can actually be more accurate than traditional heuristic-based approaches. To fill this gap, in this paper we propose a large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection. We consider five code smell types and compare machine learning models with Decor, a state-of-the-art heuristic-based approach. Key findings emphasize the need of further research aimed at improving the effectiveness of both machine learning and heuristic approaches for code smell detection: while Decor generally achieves better performance than a machine learning baseline, its precision is still too low to make it usable in practice.

70 citations


Cites background or result from "Detecting code smells using machine..."

  • ...by previous work [34], the composition of the dataset might influence the performance of a technique; this is especially true in the case of code smell detection, where a detector should recognize code smells over datasets that are both unbalanced (i....

    [...]

  • ...Although the use of machine learning looks promising, its actual accuracy for code smell detection is still under debate, as previous work has observed contrasting results [32], [34]....

    [...]

  • ...[34] demonstrated that, in a real use-case scenario, the results achieved by Arcelli Fontana et al....

    [...]

Proceedings ArticleDOI
03 Sep 2018
TL;DR: A deep learning based novel approach to detecting feature envy, one of the most common code smells, and an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention are proposed.
Abstract: Software refactoring is widely employed to improve software quality. A key step in software refactoring is to identify which part of the software should be refactored. To facilitate the identification, a number of approaches have been proposed to identify certain structures in the code (called code smells) that suggest the possibility of refactoring. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics to predictions. However, it is challenging to manually select the best features, especially textual features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting feature envy, one of the most common code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features (especially textual features) of source code for feature envy detection, and could automatically build the complex mapping between such features and predictions. We also propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art in both detecting feature envy smells and recommending destinations for identified smelly methods.

64 citations


Cites background from "Detecting code smells using machine..."

  • ...Such machine learning based approaches have proved to be effective and efficient although some experimental evaluation also reveals their significant limitations [41]....

    [...]

  • ...However, empirical studies [41] suggest that such statistical machine learning based smell detection...

    [...]

References
More filters
Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations


"Detecting code smells using machine..." refers methods in this paper

  • ...[1] evaluated six basic ML techniques: J48 [61], JRIP [62], RANDOM FOREST [63], NAIVE BAYES [64], SMO [65], and LIBSVM [66]....

    [...]

  • ...The results achieved by Arcelli Fontana et al. [1] reported that most of the classifiers have accuracy and F-Measure higher than 95%, with J48 and RANDOM FOREST being the most powerful ML techniques....

    [...]

  • ...The best performance (for all the smells) is achieved by the tree-based classifiers, i.e., RANDOM FOREST and J48: this confirms the results of the reference study, which highlighted how this type of classifiers perform better than the others....

    [...]

  • ...In their study, they found that all the machine learners experimented achieved high performance in a cross-project scenario, with the J48 and RANDOM FOREST classifiers obtaining the highest accuracy....

    [...]

  • ...Arcelli Fontana et al. [1] evaluated six basic ML techniques: J48 [61], JRIP [62], RANDOM FOREST [63], NAIVE BAYES [64], SMO [65], and LIBSVM [66]....

    [...]

Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations


"Detecting code smells using machine..." refers methods in this paper

  • ...[1] evaluated six basic ML techniques: J48 [61], JRIP [62], RANDOM FOREST [63], NAIVE BAYES [64], SMO [65], and LIBSVM [66]....

    [...]

  • ...As for J48, the three types of pruning techniques available in WEKA [67] were used, SMO was based on two kernels (e.g., POLYNOMIAL and RBF), while for LIBSVM eight different configurations, using C-SVC and V-SVC, were used....

    [...]

  • ...Arcelli Fontana et al. [1] evaluated six basic ML techniques: J48 [61], JRIP [62], RANDOM FOREST [63], NAIVE BAYES [64], SMO [65], and LIBSVM [66]....

    [...]

Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations

Journal ArticleDOI
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

19,603 citations


"Detecting code smells using machine..." refers methods in this paper

  • ...As for J48, the three types of pruning techniques available in WEKA [67] were used, SMO was based on two kernels (e.g., POLYNOMIAL and RBF), while for LIBSVM eight different configurations, using C-SVC and V-SVC, were used....

    [...]

  • ...As for the experimented prediction models, we exploited the implementation provided by the WEKA framework [67], which is widely considered as a reliable tool....

    [...]

  • ...As for J48, the three types of pruning techniques available in WEKA [67] were used, SMO was based on two kernels (e....

    [...]

Journal ArticleDOI
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Abstract: An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of oversampling the minority (abnormal)cla ss and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space)tha n only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space)t han varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC)and the ROC convex hull strategy.

17,313 citations