Detecting code smells using machine learning techniques: Are we there yet?

doi:10.1109/SANER.2018.8330266

Proceedings ArticleDOI

Detecting code smells using machine learning techniques: Are we there yet?

- pp 612-621

TLDR

The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.

Abstract:

Code smells are symptoms of poor design and implementation choices weighing heavily on the quality of produced source code. During the last decades several code smell detection tools have been proposed. However, the literature shows that the results of these tools can be subjective and are intrinsically tied to the nature and approach of the detection. In a recent work the use of Machine-Learning (ML) techniques for code smell detection has been proposed, possibly solving the issue of tool subjectivity giving to a learner the ability to discern between smelly and non-smelly source code elements. While this work opened a new perspective for code smell detection, it only considered the case where instances affected by a single type smell are contained in each dataset used to train and test the machine learners. In this work we replicate the study with a different dataset configuration containing instances of more than one type of smell. The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Learning a graph-based classifier for fault localization

Hao Zhong, +1 more

- 09 May 2020 -

Science in China Series F: Information S...

TL;DR: This paper proposes an approach called C la F a that trains a graph-based fault classifier from bug fixes, built on a recent partial-code tool called G rapa, which enables the analysis of partial programs by the complete code tool called WALA.

...read moreread less

Proceedings ArticleDOI

Do Research and Practice of Code Smell Identification Walk Together? A Social Representations Analysis

Rafael Maiani de Mello, +8 more

TL;DR: There is a considerable gap between the research of smell identification and its practice, and the theory of social representations may be useful to characterize the actual concerns of software developers.

...read moreread less

Proceedings ArticleDOI

Towards Surgically-Precise Technical Debt Estimation: Early Results and Research Roadmap

Valentina Lenarduzzi, +3 more

- 02 Aug 2019 -

arXiv: Software Engineering

TL;DR: In this paper, the authors focus on relatively simple regression modeling techniques and apply them to modeling the additional project cost connected to the sub-optimal conditions existing in the projects under study.

...read moreread less

Proceedings ArticleDOI

An Empirical Study of Code Smells in Transformer-based Code Generation Techniques

Mohammed Latif Siddiq, +4 more

TL;DR: To investigate to what extent code smells are present in the datasets of coding generation techniques and verify whether they leak into the output of these techniques, Pylint and Bandit were used.

...read moreread less

Proceedings ArticleDOI

A preliminary study on the adequacy of static analysis warnings with respect to code smell prediction

Savanna Lujan, +4 more

TL;DR: The main finding of the study reports that the warnings given by the considered tools lead the performance of code smell prediction models to drastically increase with respect to what reported by previous research in the field.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Random Forests

Leo Breiman

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Journal ArticleDOI

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011 -

ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

Book

C4.5: Programs for Machine Learning

J. Ross Quinlan

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

Journal ArticleDOI

The WEKA data mining software: an update

Mark Hall, +5 more

- 16 Nov 2009 -

Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

Journal ArticleDOI

SMOTE: synthetic minority over-sampling technique

Nitesh V. Chawla, +3 more

- 01 Jan 2002 -

Journal of Artificial Intelligence Resea...

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less