KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining
Isaac Triguero,Sergio González,Jose M. Moyano,Salvador García,Jesús Alcalá-Fdez,Julián Luengo,Alberto Fernández,María José del Jesus,Luciano Sánchez,Francisco Herrera +9 more
Reads0
Chats0
TLDR
The most recent components added to KEEL 3.0 are described, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery, which greatly improve the versatility of KEEL to deal with more modern data mining problems.Abstract:
This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems.read more
Citations
More filters
Journal ArticleDOI
A survey on semi-supervised learning
TL;DR: This survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work.
Journal ArticleDOI
Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
Giang Nguyen,Stefan Dlugolinsky,Martin Bobák,Viet Tran,Álvaro López García,Ignacio Heredia,Peter Malik,Ladislav Hluchý +7 more
TL;DR: This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software that is capable of scaling computation effectively and efficiently in the era of Big Data.
Journal ArticleDOI
A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities
TL;DR: The performance of 14 different bagging and boosting based ensembles, including XGBoost, LightGBM and Random Forest, is empirically analyzed in terms of predictive capability and efficiency.
Journal ArticleDOI
Handling data irregularities in classification: Foundations, trends, and future challenges
TL;DR: This article provides a bird's eye view of data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities, and discusses the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities.
References
More filters
Book
Data Mining: Concepts and Techniques
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Journal Article
Statistical Comparisons of Classifiers over Multiple Data Sets
TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
BookDOI
An introduction to statistical learning
TL;DR: An introduction to statistical learning provides an accessible overview of the essential toolset for making sense of the vast and complex data sets that have emerged in science, industry, and other sectors in the past twenty years.
Journal ArticleDOI
A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.