scispace - formally typeset
Proceedings ArticleDOI

Detecting code smells using machine learning techniques: Are we there yet?

TLDR
The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.
Abstract
Code smells are symptoms of poor design and implementation choices weighing heavily on the quality of produced source code. During the last decades several code smell detection tools have been proposed. However, the literature shows that the results of these tools can be subjective and are intrinsically tied to the nature and approach of the detection. In a recent work the use of Machine-Learning (ML) techniques for code smell detection has been proposed, possibly solving the issue of tool subjectivity giving to a learner the ability to discern between smelly and non-smelly source code elements. While this work opened a new perspective for code smell detection, it only considered the case where instances affected by a single type smell are contained in each dataset used to train and test the machine learners. In this work we replicate the study with a different dataset configuration containing instances of more than one type of smell. The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.

read more

Citations
More filters

A GQM-based Method and a Bayesian Approach for the Detection of Code and Design Smells

TL;DR: In this paper, a probabilistic model is proposed to detect occurrences of the Blob antipattern in code and design smells in programs, which can be calibrated using machine learning techniques to offer an improved, context-specific detection.
Journal ArticleDOI

Machine learning techniques for code smell detection: A systematic literature review and meta-analysis

TL;DR: There is still room for the improvement of machine learning techniques in the context of code smell detection and it is argued that JRip and Random Forest are the most effective classifiers in terms of performance.
Journal ArticleDOI

Beyond Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells?

TL;DR: A mixed-methods empirical study of 117 releases from 9 open-source systems finds that community-related factors contribute to the intensity of code smells, supporting the joint use of community and code smells detection as a mechanism for the joint management of technical and social problems around software development communities.
Proceedings ArticleDOI

Comparing heuristic and machine learning approaches for metric-based code smell detection

TL;DR: A large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection, and considers five code smell types and compares machine learning models with DECOR, a state-of-the-art heuristics-based approach.
Proceedings ArticleDOI

Deep learning based feature envy detection

TL;DR: A deep learning based novel approach to detecting feature envy, one of the most common code smells, and an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention are proposed.
References
More filters
Journal ArticleDOI

Classification model for code clones based on machine learning

TL;DR: A classification model that applies machine learning to the judgments of each individual user regarding the code clones is proposed that showed more than 70 % accuracy on average and more than 90% accuracy for some particular users and projects.
Book ChapterDOI

Anti-Pattern Detection: Methods, Challenges, and Open Issues

TL;DR: From the analysis of the state-of-the-art, this chapter will derive a set of guidelines for building and evaluating recommendation systems supporting the detection of anti-patterns and discuss some problems that are still open, to trace future research directions in the field.
Journal ArticleDOI

Developing Fault-Prediction Models: What the Research Can Show Industry

TL;DR: A systematic review of the research literature on fault-prediction models from 2000 through 2010 identified 36 studies that sufficiently defined their models and development context and methodology and quantitatively analyzed 19 studies and the 206 models they presented.
Proceedings ArticleDOI

How do Scratch Programmers Name Variables and Procedures

TL;DR: The results of the analysis show that Scratch programmers often prefer longer identifier names than developers in other languages, while Scratch procedure names have even longer names than Scratch variables, and the naming patterns found support this claim.
BookDOI

Perspectives on the Future of Software Engineering

TL;DR: The proposal in this paper is to work towards a scientific basis for software engineering by capturing more such time-lagging dependencies among software artifacts in the form of empirical models and thereby making developers aware of so-called “cognitive laws” that must be adhered to.
Related Papers (5)