Toward semantic data imputation for a dengue dataset

doi:10.1016/J.KNOSYS.2020.105803

Journal ArticleDOI

Toward semantic data imputation for a dengue dataset

N. Kamkhad, +3 more

- 21 May 2020 -

Knowledge Based Systems

- Vol. 196, pp 105803

TLDR

An improvement in the efficiency of predicting missing data utilizing Particle Swarm Optimization (PSO), which is applied to the numerical data cleansing problem, with the performance of PSO being enhanced using K-means to help determine the fitness value.

Abstract:

Missing data are a major problem that affects data analysis techniques for forecasting. Traditional methods suffer from poor performance in predicting missing values using simple techniques, e.g., mean and mode. In this paper, we present and discuss a novel method of imputing missing values semantically with the use of an ontology model. We make three new contributions to the field: first, an improvement in the efficiency of predicting missing data utilizing Particle Swarm Optimization (PSO), which is applied to the numerical data cleansing problem, with the performance of PSO being enhanced using K-means to help determine the fitness value. Second, the incorporation of an ontology with PSO for the purpose of narrowing the search space, to make PSO provide greater accuracy in predicting numerical missing values while quickly converging on the answer. Third, the facilitation of a framework to substitute nominal data that are lost from the dataset using the relationships of concepts and a reasoning mechanism concerning the knowledge-based model. The experimental results indicated that the proposed method could estimate missing data more efficiently and with less chance of error than conventional methods, as measured by the root mean square error.

Toward semantic data imputation for a dengue dataset

Citations

Nearest neighbor imputation for categorical data by weighting of attributes

Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic Review

Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic Review

References

Particle swarm optimization

Introduction to Information Retrieval

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

Data mining with big data

Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)

Related Papers (5)

An effective method for classification with missing values

Enhanced Fuzzy K-NN Approach for Handling Missing Values in Medical Data Mining

Data-Driven Machine Learning Approach for Predicting Missing Values in Large Data Sets: A Comparison Study

Dealing with Missing Data and Uncertainty in the Context of Data Mining

Proposition of a Theoretical Model for Missing Data Imputation using Deep Learning and Evolutionary Algorithms.