scispace - formally typeset
Journal ArticleDOI

Toward semantic data imputation for a dengue dataset

TLDR
An improvement in the efficiency of predicting missing data utilizing Particle Swarm Optimization (PSO), which is applied to the numerical data cleansing problem, with the performance of PSO being enhanced using K-means to help determine the fitness value.
Abstract
Missing data are a major problem that affects data analysis techniques for forecasting. Traditional methods suffer from poor performance in predicting missing values using simple techniques, e.g., mean and mode. In this paper, we present and discuss a novel method of imputing missing values semantically with the use of an ontology model. We make three new contributions to the field: first, an improvement in the efficiency of predicting missing data utilizing Particle Swarm Optimization (PSO), which is applied to the numerical data cleansing problem, with the performance of PSO being enhanced using K-means to help determine the fitness value. Second, the incorporation of an ontology with PSO for the purpose of narrowing the search space, to make PSO provide greater accuracy in predicting numerical missing values while quickly converging on the answer. Third, the facilitation of a framework to substitute nominal data that are lost from the dataset using the relationships of concepts and a reasoning mechanism concerning the knowledge-based model. The experimental results indicated that the proposed method could estimate missing data more efficiently and with less chance of error than conventional methods, as measured by the root mean square error.

read more

Citations
More filters
Journal ArticleDOI

Nearest neighbor imputation for categorical data by weighting of attributes

- 01 May 2022 - 
TL;DR: In this article , a weighted nearest neighbor approach is proposed to impute missing values in categorical variables in high dimensional datasets, which explicitly uses the information on the association among attributes.
Journal ArticleDOI

Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic Review

TL;DR: The study revealed that missing at random (MAR) is the most common proposed missing mechanism in the datasets and the hybridizations of metaheuristics with clustering or neural networks are popular among researchers.
Journal ArticleDOI

Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic Review

- 01 Jan 2022 - 
TL;DR: In this paper , the authors identify and review the existing research on missing value imputation in terms of nature-inspired metaheuristic approaches, dataset designs, missingness mechanisms, and missing rates, as well as the most used evaluation metrics between 2011 and 2021.
References
More filters
Proceedings ArticleDOI

Particle swarm optimization

TL;DR: A concept for the optimization of nonlinear functions using particle swarm methodology is introduced, and the evolution of several paradigms is outlined, and an implementation of one of the paradigm is discussed.
Book

Introduction to Information Retrieval

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Journal ArticleDOI

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

TL;DR: A Monte Carlo simulation examined the performance of 4 missing data methods in structural equation models and found that full information maximum likelihood (FIML) estimation was superior across all conditions of the design.
Journal ArticleDOI

Data mining with big data

TL;DR: A HACE theorem is presented that characterizes the features of the Big Data revolution, and a Big Data processing model is proposed, from the data mining perspective, which involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations.
Journal ArticleDOI

Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)

TL;DR: A novel evolutionary optimization strategy based on the derandomized evolution strategy with covariance matrix adaptation (CMA-ES), intended to reduce the number of generations required for convergence to the optimum, which results in a highly parallel algorithm which scales favorably with large numbers of processors.
Related Papers (5)