scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Data reconciliation and fusion methods: A survey

20 Jul 2020-Applied Computing and Informatics (No longer published by Elsevier)-
TL;DR: This paper presents, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion, and defines first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies.
About: This article is published in Applied Computing and Informatics.The article was published on 2020-07-20 and is currently open access. It has received 8 citations till now. The article focuses on the topics: Data integration & Sensor fusion.
Citations
More filters

[...]

21 May 2013
TL;DR: In this paper, a video montre un exemple de realisation d'application developpee dans l'UE GLIHM du Master Informatique, specialite E-services.
Abstract: Cette video montre un exemple de realisation d'application developpee dans l'UE GLIHM du Master Informatique, specialite E-services. L'application 'I'm in' permet de creer et de partager des evenements sur le campus de Lille 1. Chaque utilisateur peut definir des centres d’interets ce qui permet d’…

252 citations

Journal ArticleDOI
TL;DR: Data reconciliation is data preprocessing method which can improve the accuracy of measured data through process modelling, optimization, and can be applied for gross error detection together with the statistical test method and was successfully detected and validated by onsite inspection of the olefin plant.

7 citations

Journal ArticleDOI
TL;DR: Digital receipts provide high levels of confidence because of their completeness, accuracy, easiness, efficiency, simplicity, and suitability for use by MSMEs.
Abstract: Introduction/Main Objectives: This research seeks to analyze the use of digital receipts and multi-platform e-commerces’ data integration and the influence they have on the process of reconciliation and the preparation of financial reports by micro, small, and medium-sized enterprises (MSMEs). Background Problems: The use of multi-platform online transactions requires the validation and conversion of the data, which can be an issue during the reconciliation process and the preparation of financial reports. The issue of the data’s integration is due to differences in the interfaces used for the multi-platform transactions and the MSMEs internal abilities. Novelty: The integration and treatment of online transactions via multi-platforms should an internal records or database for the MSMEs, the reconciliation process and preparing financial reports. Research Methods: This research uses a quantitative method with partial least squares structural equation modeling (PLS-SEM) analysis and descriptive analysis to reveal the reconciliation process and the conditions under which financial reports are prepared. Finding/Results: The problems of integrating data from various platforms into internal reports causes duplication of the internal reporting and a long reporting process and the chance of errors. The digital receipt is treated as proof of a manual transaction. Records’ duplication is a technical issue that causes delays in the processing time and reconciliation. None of the MSMEs have a machine-to-machine based (automized) reconciliation process. Conclusion : Transaction receipts from all the platforms affect the transactions’ recording, the reconciliation process, and the preparation of financial reports. Digital receipts provide high levels of confidence because of their completeness, accuracy, easiness, efficiency, simplicity, and suitability for use by MSMEs. The adoption of online sales and payments by MSMEs is highly effective, yet this has not been followed-up with the data’s integration into the reconciliation and accounting process for preparing financial reports.

5 citations


Additional excerpts

  • ...(Abboora et al., 2016; Ertz et al., 2016; Bakhtouchi, 2019; Charlesworth, 2018:307; Shakr & Zomaya, 2019:314)....

    [...]

Proceedings ArticleDOI
23 Jul 2021
TL;DR: In this paper, a novel entity association relationship modeling approach driven by dynamic detecting probes is proposed to solve the problem of integrating and fusing scattered and heterogeneous data in the process of enterprise data space construction.
Abstract: To solve the problem of integrating and fusing scattered and heterogeneous data in the process of enterprise data space construction, we propose a novel entity association relationship modeling approach driven by dynamic detecting probes. By deploying acquisition units between the business logic layer and data access layer of different applications and dynamically collecting key information such as global data structure, related data and access logs, the entity association model for enterprise data space is constructed from three levels: schema, instance, and log. At the schema association level, a multidimensional similarity discrimination algorithm combined with semantic analysis is used to achieve the rapid fusion of similar entities; at the instance association level, a combination of feature vector-based similarity analysis and deep learning is used to complete the association matching of different entities for structured data such as numeric and character data and unstructured data such as long text data; at the log association level, the association between different entities and attributes is established by analyzing the equivalence relationships in the data access logs. In addition, to address the uncertainty problem in the association construction process, a fuzzy logic-based inference model is applied to obtain the final entity association construction scheme.

2 citations

Proceedings ArticleDOI
26 Jan 2023
TL;DR: In this article , an efficient decision procedure to determine the data constraint equivalence is presented, and two light-weighted analyses to refute and prove the equivalence, respectively, are proved to achieve in polynomial time.
Abstract: Data constraints are widely used in FinTech systems for monitoring data consistency and diagnosing anomalous data manipulations. However, many equivalent data constraints are created redundantly during the development cycle, slowing down the FinTech systems and causing unnecessary alerts. We present EQDAC, an efficient decision procedure to determine the data constraint equivalence. We first propose the symbolic representation for semantic encoding and then introduce two light-weighted analyses to refute and prove the equivalence, respectively, which are proved to achieve in polynomial time. We evaluate EQDAC upon 30,801 data constraints in a FinTech system. It is shown that EQDAC detects 11,538 equivalent data constraints in three hours. It also supports efficient equivalence searching with an average time cost of 1.22 seconds, enabling the system to check new data constraints upon submission.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events.
Abstract: A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events (said to be matched). A comparison is to be made between the recorded characteristics and values in two records (one from each file) and a decision made as to whether or not the members of the comparison-pair represent the same person or event, or whether there is insufficient evidence to justify either of these decisions at stipulated levels of error. These three decisions are referred to as link (A 1), a non-link (A 3), and a possible link (A 2). The first two decisions are called positive dispositions. The two types of error are defined as the error of the decision A 1 when the members of the comparison pair are in fact unmatched, and the error of the decision A 3 when the members of the comparison pair are, in fact matched. The probabilities of these errors are defined as and respecti...

2,306 citations

Journal ArticleDOI
TL;DR: This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data Fusion.
Abstract: The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation.This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.

1,797 citations


"Data reconciliation and fusion meth..." refers background in this paper

  • ...Data conflicts can be classified into two classes: (a) Attribute value uncertainty when information are missing, and (b) contradictions when the attribute have different values [1,51]....

    [...]

  • ...Since then, even if the problem got less attention, but some techniques have been proposed [51]....

    [...]

  • ...Other approaches define new operators and combine them to existing ones [51]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database and covers similarity metrics that are commonly used to detect similar field entries.
Abstract: Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors. In this paper, we present a thorough analysis of the literature on duplicate record detection. We cover similarity metrics that are commonly used to detect similar field entries, and we present an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database. We also cover multiple techniques for improving the efficiency and scalability of approximate duplicate detection algorithms. We conclude with coverage of existing tools and with a brief discussion of the big open problems in the area

1,778 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the data fusion state of the art is proposed, exploring its conceptualizations, benefits, and challenging aspects, as well as existing methodologies.

1,684 citations

Journal Article
TL;DR: This work classifies data quality problems that are addressed by data cleaning and provides an overview of the main solution approaches and discusses current tool support for data cleaning.
Abstract: We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called ETL process. We also discuss current tool support for data cleaning.

1,675 citations