Proceedings ArticleDOI
Data validation for business continuity planning
Soujanya Soni,Sameep Mehta,Sandeep Hans +2 more
- pp 72-77
TLDR
A Metadata driven rule-based data validation system, which is domain independent, distributed, scalable and can easily accommodate changes in business requirements is employed.Abstract:
In this paper we present a system and case study for business data validation in large organizations. The validated and consistent data provides the capability to handle outages and incidents in a more principled fashion and helps in business continuity. Typically, different business units employ separate systems to produce and store their data. The data owners choose their own technology for data base storage. It is a non-trivial task to keep the data consistent across business units in the organization. This non-availability of consistent data can lead to sub optimal planning during outages and organizations can incur huge financial costs. Traditional custom data validation system fetches the data from various data sources and flow it through the central validation system resulting in huge data transfer cost. Moreover, accommodating change in business rules is laborious process. Accommodating such changes in the system can lead to re-design and re-development of the system. This is a very costly and time consuming activity. In this paper, we employ a Metadata driven rule-based data validation system, which is domain independent, distributed, scalable and can easily accommodate changes in business requirements. We have deployed our system in real life settings. We present some of the results in this paper.read more
Citations
References
More filters
Journal Article
Data Cleaning: Problems and Current Approaches.
Erhard Rahm,Hong Hai Do +1 more
TL;DR: This work classifies data quality problems that are addressed by data cleaning and provides an overview of the main solution approaches and discusses current tool support for data cleaning.
Proceedings ArticleDOI
The merge/purge problem for large databases
TL;DR: This paper details the sorted neighborhood method that is used by some to solve merge/purge and presents experimental results that demonstrates this approach may work well in practice but at great expense, and shows a means of improving the accuracy of the results based upon a multi-pass approach.
Journal ArticleDOI
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
TL;DR: This paper develops a system for accomplishing this Data Cleansing task and demonstrates its use for cleansing lists of names of potential customers in a direct marketing-type application and reports on the successful implementation for a real-world database that conclusively validates results previously achieved for statistically generated data.
Proceedings Article
The field matching problem: Algorithms and applications
Alvaro Monge,Charles Elkan +1 more
TL;DR: Three field matching algorithms are described, one of which is the well-known Smith-Waterman algorithm for comparing DNA and protein sequences, and their performance on real-world datasets is evaluated.
Proceedings ArticleDOI
Robust and efficient fuzzy match for online data cleaning
TL;DR: A new similarity function is proposed which overcomes limitations of commonly used similarity functions, and an efficient fuzzy match algorithm is developed which can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation.