scispace - formally typeset
Open AccessJournal ArticleDOI

KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

Reads0
Chats0
TLDR
The most recent components added to KEEL 3.0 are described, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery, which greatly improve the versatility of KEEL to deal with more modern data mining problems.
Abstract
This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems.

read more

Citations
More filters
Journal ArticleDOI

A survey on semi-supervised learning

TL;DR: This survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work.
Journal ArticleDOI

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

TL;DR: This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software that is capable of scaling computation effectively and efficiently in the era of Big Data.
Journal ArticleDOI

A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities

TL;DR: The performance of 14 different bagging and boosting based ensembles, including XGBoost, LightGBM and Random Forest, is empirically analyzed in terms of predictive capability and efficiency.
Journal ArticleDOI

Handling data irregularities in classification: Foundations, trends, and future challenges

TL;DR: This article provides a bird's eye view of data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities, and discusses the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities.
References
More filters
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Journal Article

Statistical Comparisons of Classifiers over Multiple Data Sets

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
BookDOI

An introduction to statistical learning

TL;DR: An introduction to statistical learning provides an accessible overview of the essential toolset for making sense of the vast and complex data sets that have emerged in science, industry, and other sectors in the past twenty years.
Journal ArticleDOI

A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.
Related Papers (5)