Proceedings ArticleDOI
Detecting code smells using machine learning techniques: Are we there yet?
Dario Di Nucci,Fabio Palomba,Damian A. Tamburri,Alexander Serebrenik,Andrea De Lucia +4 more
- pp 612-621
TLDR
The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.Abstract:
Code smells are symptoms of poor design and implementation choices weighing heavily on the quality of produced source code. During the last decades several code smell detection tools have been proposed. However, the literature shows that the results of these tools can be subjective and are intrinsically tied to the nature and approach of the detection. In a recent work the use of Machine-Learning (ML) techniques for code smell detection has been proposed, possibly solving the issue of tool subjectivity giving to a learner the ability to discern between smelly and non-smelly source code elements. While this work opened a new perspective for code smell detection, it only considered the case where instances affected by a single type smell are contained in each dataset used to train and test the machine learners. In this work we replicate the study with a different dataset configuration containing instances of more than one type of smell. The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.read more
Citations
More filters
A GQM-based Method and a Bayesian Approach for the Detection of Code and Design Smells
TL;DR: In this paper, a probabilistic model is proposed to detect occurrences of the Blob antipattern in code and design smells in programs, which can be calibrated using machine learning techniques to offer an improved, context-specific detection.
Journal ArticleDOI
Machine learning techniques for code smell detection: A systematic literature review and meta-analysis
TL;DR: There is still room for the improvement of machine learning techniques in the context of code smell detection and it is argued that JRip and Random Forest are the most effective classifiers in terms of performance.
Journal ArticleDOI
Beyond Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells?
Fabio Palomba,Damian A. Tamburri,Francesca Arcelli Fontana,Rocco Oliveto,Andy Zaidman,Alexander Serebrenik +5 more
TL;DR: A mixed-methods empirical study of 117 releases from 9 open-source systems finds that community-related factors contribute to the intensity of code smells, supporting the joint use of community and code smells detection as a mechanism for the joint management of technical and social problems around software development communities.
Proceedings ArticleDOI
Comparing heuristic and machine learning approaches for metric-based code smell detection
TL;DR: A large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection, and considers five code smell types and compares machine learning models with DECOR, a state-of-the-art heuristics-based approach.
Proceedings ArticleDOI
Deep learning based feature envy detection
Hui Liu,Zhifeng Xu,Yanzhen Zou +2 more
TL;DR: A deep learning based novel approach to detecting feature envy, one of the most common code smells, and an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention are proposed.
References
More filters
Journal ArticleDOI
Cross-Validatory Choice and Assessment of Statistical Predictions
TL;DR: In this article, a generalized form of the cross-validation criterion is applied to the choice and assessment of prediction using the data-analytic concept of a prescription, and examples used to illustrate the application are drawn from the problem areas of univariate estimation, linear regression and analysis of variance.
Journal ArticleDOI
A caution regarding rules of thumb for variance inflation factors.
TL;DR: In this article, the authors examined the effect of the variance inflation factor (VIF) on the results of regression analyses, and found that threshold values of the VIF need to be evaluated in the context of several other factors that influence the variance of regression coefficients.
A Practical Guide to Support Vector Classication
TL;DR: A simple procedure is proposed, which usually gives reasonable results and is suitable for beginners who are not familiar with SVM.
Journal Article
Random search for hyper-parameter optimization
James Bergstra,Yoshua Bengio +1 more
TL;DR: This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.
Book
Refactoring: Improving the Design of Existing Code
TL;DR: Almost every expert in Object-Oriented Development stresses the importance of iterative development, but how do you add function to the existing code base while still preserving its design integrity?