scispace - formally typeset
Search or ask a question

Showing papers by "Michael J. Pazzani published in 2020"


Journal ArticleDOI
TL;DR: An accurate and efficient algorithm for missing data reconstruction (imputation), that is specifically designed to recover off-period segments of missing data, and introduces a caching approach that reduces the search space and improves the computational complexity to linear in the common case.
Abstract: Noise and missing data are intrinsic characteristics of real-world data, leading to uncertainty that negatively affects the quality of knowledge extracted from the data. The burden imposed by missing data is often severe in sensors that collect data from the physical world, where large gaps of missing data may occur when the system is temporarily off or disconnected. How can we reconstruct missing data for these periods? We introduce an accurate and efficient algorithm for missing data reconstruction (imputation), that is specifically designed to recover off-period segments of missing data. This algorithm, Ghost , searches the sequential dataset to find data segments that have a prior and posterior segment that matches those of the missing data. If there is a similar segment that also satisfies the constraint – such as location or time of day – then it is substituted for the missing data. A baseline approach results in quadratic computational complexity, therefore we introduce a caching approach that reduces the search space and improves the computational complexity to linear in the common case. Experimental evaluations on five real-world datasets show that our algorithm significantly outperforms four state-of-the-art algorithms with an average of 18 percent higher F-score.

18 citations