scispace - formally typeset
Proceedings ArticleDOI

Detecting code smells using machine learning techniques: Are we there yet?

TLDR
The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.
Abstract
Code smells are symptoms of poor design and implementation choices weighing heavily on the quality of produced source code. During the last decades several code smell detection tools have been proposed. However, the literature shows that the results of these tools can be subjective and are intrinsically tied to the nature and approach of the detection. In a recent work the use of Machine-Learning (ML) techniques for code smell detection has been proposed, possibly solving the issue of tool subjectivity giving to a learner the ability to discern between smelly and non-smelly source code elements. While this work opened a new perspective for code smell detection, it only considered the case where instances affected by a single type smell are contained in each dataset used to train and test the machine learners. In this work we replicate the study with a different dataset configuration containing instances of more than one type of smell. The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.

read more

Citations
More filters
Journal ArticleDOI

Learning a graph-based classifier for fault localization

TL;DR: This paper proposes an approach called C la F a that trains a graph-based fault classifier from bug fixes, built on a recent partial-code tool called G rapa, which enables the analysis of partial programs by the complete code tool called WALA.
Proceedings ArticleDOI

Do Research and Practice of Code Smell Identification Walk Together? A Social Representations Analysis

TL;DR: There is a considerable gap between the research of smell identification and its practice, and the theory of social representations may be useful to characterize the actual concerns of software developers.
Proceedings ArticleDOI

Towards Surgically-Precise Technical Debt Estimation: Early Results and Research Roadmap

TL;DR: In this paper, the authors focus on relatively simple regression modeling techniques and apply them to modeling the additional project cost connected to the sub-optimal conditions existing in the projects under study.
Proceedings ArticleDOI

An Empirical Study of Code Smells in Transformer-based Code Generation Techniques

TL;DR: To investigate to what extent code smells are present in the datasets of coding generation techniques and verify whether they leak into the output of these techniques, Pylint and Bandit were used.
Proceedings ArticleDOI

A preliminary study on the adequacy of static analysis warnings with respect to code smell prediction

TL;DR: The main finding of the study reports that the warnings given by the considered tools lead the performance of code smell prediction models to drastically increase with respect to what reported by previous research in the field.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal ArticleDOI

SMOTE: synthetic minority over-sampling technique

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Related Papers (5)