KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

doi:10.2991/IJCIS.10.1.82

Open AccessJournal ArticleDOI

KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

Isaac Triguero, +9 more

- 26 Sep 2017 -

International Journal of Computational I...

- Vol. 10, Iss: 1, pp 1238-1249

Chats0

TLDR

The most recent components added to KEEL 3.0 are described, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery, which greatly improve the versatility of KEEL to deal with more modern data mining problems.

Abstract:

This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems.

Citations

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

Petra Perner

Journal ArticleDOI

A survey on semi-supervised learning

Jesper E. van Engelen, +2 more

- 01 Feb 2020 -

Machine Learning

TL;DR: This survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work.

...read moreread less

Journal ArticleDOI

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Giang Nguyen, +7 more

- 19 Jan 2019 -

Artificial Intelligence Review

TL;DR: This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software that is capable of scaling computation effectively and efficiently in the era of Big Data.

...read moreread less

Journal ArticleDOI

A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities

Sergio González, +4 more

- 01 Dec 2020 -

Information Fusion

TL;DR: The performance of 14 different bagging and boosting based ensembles, including XGBoost, LightGBM and Random Forest, is empirically analyzed in terms of predictive capability and efficiency.

...read moreread less

Journal ArticleDOI

Handling data irregularities in classification: Foundations, trends, and future challenges

Swagatam Das, +2 more

- 01 Sep 2018 -

Pattern Recognition

TL;DR: This article provides a bird's eye view of data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities, and discusses the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Journal Article

Statistical Comparisons of Classifiers over Multiple Data Sets

Janez Demšar

- 01 Dec 2006 -

Journal of Machine Learning Research

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.

...read moreread less

Data Mining - Concepts and Techniques.

Petra Perner

BookDOI

An introduction to statistical learning

Gareth M. James, +3 more

TL;DR: An introduction to statistical learning provides an accessible overview of the essential toolset for making sense of the vast and complex data sets that have emerged in science, industry, and other sectors in the past twenty years.

...read moreread less

Journal ArticleDOI

A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

Mikel Galar, +4 more

TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.

...read moreread less