Trends in Cleaning Relational Data: Consistency and Deduplication
Citations
361 citations
Cites background or methods from "Trends in Cleaning Relational Data:..."
...For error detection, many methods rely on violations of integrity constraints [11, 14] or duplicate [27, 37, 40] and outlier detection [19, 29] methods to identify errors....
[...]
...Data cleaning can be separated in two tasks: (i) error detection, where data inconsistencies such as duplicate data, integrity constraint violations, and incorrect or missing data values are identified, and (ii) data repairing, which involves updating the available data to remove any detected…...
[...]
298 citations
Cites background from "Trends in Cleaning Relational Data:..."
...Most of the materials for this tutorial can be found in Foundations and Trends in Databases [41], and in an overview of the SampleClean project [51]....
[...]
...Foundations and Trends in Databases, 5(4):281–393, 2015....
[...]
...Most of the materials in the first part of the tutorial come from our survey in Foundations and Trends in Databases [41]....
[...]
...Foundations and Trends in Databases, 6(1-2):1–161, 2013....
[...]
180 citations
Cites background from "Trends in Cleaning Relational Data:..."
...There have been several surveys on classifying data errors [6, 17, 19, 24, 29]....
[...]
...[19] consider errors as violations of qualitative rules and patterns, such as denial constraints [6]....
[...]
129 citations
Cites background from "Trends in Cleaning Relational Data:..."
...provide an overview on standard consistency definitions [23]....
[...]
109 citations
Cites background from "Trends in Cleaning Relational Data:..."
...In such a case, there are likely to be more inconsistencies between data sources [18, 27] and cases where the actual truth is a matter of perspective [15]....
[...]
References
20,309 citations
"Trends in Cleaning Relational Data:..." refers methods in this paper
...Recently, cleaning large datasets was implemented using a distributed framework, such as MapReduce [35], or Spark [127]....
[...]
17,663 citations
8,811 citations
5,227 citations
"Trends in Cleaning Relational Data:..." refers methods in this paper
...A classifier independent approach to derive the uncertainty is done by measuring the disagreement among the predications of a set of classifiers, also known as a committee [100]....
[...]