scispace - formally typeset
Proceedings ArticleDOI

Data validation for business continuity planning

TLDR
A Metadata driven rule-based data validation system, which is domain independent, distributed, scalable and can easily accommodate changes in business requirements is employed.
Abstract
In this paper we present a system and case study for business data validation in large organizations. The validated and consistent data provides the capability to handle outages and incidents in a more principled fashion and helps in business continuity. Typically, different business units employ separate systems to produce and store their data. The data owners choose their own technology for data base storage. It is a non-trivial task to keep the data consistent across business units in the organization. This non-availability of consistent data can lead to sub optimal planning during outages and organizations can incur huge financial costs. Traditional custom data validation system fetches the data from various data sources and flow it through the central validation system resulting in huge data transfer cost. Moreover, accommodating change in business rules is laborious process. Accommodating such changes in the system can lead to re-design and re-development of the system. This is a very costly and time consuming activity. In this paper, we employ a Metadata driven rule-based data validation system, which is domain independent, distributed, scalable and can easily accommodate changes in business requirements. We have deployed our system in real life settings. We present some of the results in this paper.

read more

Citations
References
More filters
Journal Article

Data Cleaning: Problems and Current Approaches.

TL;DR: This work classifies data quality problems that are addressed by data cleaning and provides an overview of the main solution approaches and discusses current tool support for data cleaning.
Proceedings ArticleDOI

The merge/purge problem for large databases

TL;DR: This paper details the sorted neighborhood method that is used by some to solve merge/purge and presents experimental results that demonstrates this approach may work well in practice but at great expense, and shows a means of improving the accuracy of the results based upon a multi-pass approach.
Journal ArticleDOI

Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

TL;DR: This paper develops a system for accomplishing this Data Cleansing task and demonstrates its use for cleansing lists of names of potential customers in a direct marketing-type application and reports on the successful implementation for a real-world database that conclusively validates results previously achieved for statistically generated data.
Proceedings Article

The field matching problem: Algorithms and applications

TL;DR: Three field matching algorithms are described, one of which is the well-known Smith-Waterman algorithm for comparing DNA and protein sequences, and their performance on real-world datasets is evaluated.
Proceedings ArticleDOI

Robust and efficient fuzzy match for online data cleaning

TL;DR: A new similarity function is proposed which overcomes limitations of commonly used similarity functions, and an efficient fuzzy match algorithm is developed which can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation.
Related Papers (5)