Topic

Data Corruption

About: Data Corruption is a research topic. Over the lifetime, 435 publications have been published within this topic receiving 6784 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A Generic Fault Countermeasure Providing Data and Program Flow Integrity

[...]

Marcel Medwed¹, Jörn-Marc Schmidt¹•Institutions (1)

Graz University of Technology¹

10 Aug 2008

TL;DR: This work presents a generic and elegant approach by using a highly fault secure algebraic structure that is compatible to finite fields and rings and preserves its error detection property throughout addition and multiplication.

...read moreread less

Abstract: So far many software countermeasures against fault attacks have been proposed. However, most of them are tailored to a specific cryptographic algorithm or focus on securing the processed data only. In this work we present a generic and elegant approach by using a highly fault secure algebraic structure. This structure is compatible to finite fields and rings and preserves its error detection property throughout addition and multiplication. Additionally, we introduce a method to generate a fingerprint of the instruction sequence. Thus, it is possible to check the result for data corruption as well as for modifications in the program flow. This is even possible if the order of the instructions is randomized. Furthermore, the properties of the countermeasure allow the deployment of error detection as well as error diffusion. We point out that the overhead for the calculations and for the error checking within this structure is reasonable and that the transformations are efficient. In addition we discuss how our approach increases the security in various kinds of fault scenarios.

...read moreread less

20 citations

LinkChains: Exploring the space of decentralised trustworthy Linked Data

[...]

Allan Third, John Domingue

28 Jul 2017

TL;DR: Potential approaches to the storage and querying of Linked Data with varying degrees of decentralisation and guarantees of integrity, using distributed ledgers are described, and their a priori differences in performance, storage limitations and reliability are discussed.

...read moreread less

Abstract: Distributed ledger platforms based on blockchains provide a fully distributed form of data storage which can guarantee data integrity Certain use cases, such as medical applications, can benefit from guarantees that the results of arbitrary queries against a Linked Dataset faithfully represent its contents as originally published, without tampering or data corruption We describe potential approaches to the storage and querying of Linked Data with varying degrees of decentralisation and guarantees of integrity, using distributed ledgers, and discuss their a priori differences in performance, storage limitations and reliability, setting out a programme for future empirical research

...read moreread less

19 citations

Proceedings Article•DOI•

Towards optimizing large-scale data transfers with end-to-end integrity verification

[...]

Si Liu¹, Eun-Sung Jung², Rajkumar Kettimuthu³, Xian-He Sun¹, Michael E. Papka³ - Show less +1 more•Institutions (3)

Illinois Institute of Technology¹, Hongik University², Argonne National Laboratory³

01 Dec 2016

TL;DR: In this paper, file-level and block-level (with various block sizes) pipelining is proposed to overlap data transfer and checksum computation in GridFTP, which can improve the overall data transfer time with end-to-end integrity verification.

...read moreread less

Abstract: The scale of scientific data generated by experimental facilities and simulations on high-performance computing facilities has been growing rapidly. In many cases, this data needs to be transferred rapidly and reliably to remote facilities for storage, analysis, sharing etc. At the same time, users want to verify the integrity of the data by doing a checksum after the data has been written to disk at the destination, to ensure the file has not been corrupted, for example due to network or storage data corruption, software bugs or human error. This end-to-end integrity verification creates additional overhead (extra disk I/O and more computation) and increases the overall data transfer time. In this paper, we evaluate strategies to maximize the overlap between data transfer and checksum computation. More specifically, we evaluate file-level and block-level (with various block sizes) pipelining to overlap data transfer and checksum computation. We evaluate these pipelining approaches in the context of GridFTP, a widely used protocol for science data transfers. We conducted both theoretical analysis and real experiments to evaluate our methods. The results show that block-level pipelining is an effective method in maximizing the overlap between data transfer and checksum computation and can improve the overall data transfer time with end-to-end integrity verification by up to 70% compared to the sequential execution of transfer and checksum, and by up to 60% compared to file-level pipelining.

...read moreread less

19 citations

Journal Article•DOI•

CleanM: an optimizable query language for unified scale-out data cleaning

[...]

Stella Giannakopoulou¹, Manos Karpathiotakis¹, Benjamin Gaidioz¹, Anastasia Ailamaki¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Aug 2017

TL;DR: This paper addresses the coverage and efficiency problems of data cleaning by introducing CleanM (pronounced clean'em), a language which can express multiple types of cleaning operations and validated the applicability of CleanM on top of CleanDB, a newly designed and implemented framework which can query heterogeneous data.

...read moreread less

Abstract: Data cleaning has become an indispensable part of data analysis due to the increasing amount of dirty data. Data scientists spend most of their time preparing dirty data before it can be used for data analysis. At the same time, the existing tools that attempt to automate the data cleaning procedure typically focus on a specific use case and operation. Still, even such specialized tools exhibit long running times or fail to process large datasets. Therefore, from a user's perspective, one is forced to use a different, potentially inefficient tool for each category of errors.This paper addresses the coverage and efficiency problems of data cleaning. It introduces CleanM (pronounced clean'em), a language which can express multiple types of cleaning operations. CleanM goes through a three-level translation process for optimization purposes; a different family of optimizations is applied in each abstraction level. Thus, CleanM can express complex data cleaning tasks, optimize them in a unified way, and deploy them in a scaleout fashion. We validate the applicability of CleanM by using it on top of CleanDB, a newly designed and implemented framework which can query heterogeneous data. When compared to existing data cleaning solutions, CleanDB a) covers more data corruption cases, b) scales better, and can handle cases for which its competitors are unable to terminate, and c) uses a single interface for querying and for data cleaning.

...read moreread less

19 citations

Proceedings Article•DOI•

Data recovery for web applications

[...]

Istemi Ekin Akkus¹, Ashvin Goel¹•Institutions (1)

University of Toronto¹

09 Aug 2010

TL;DR: The design of a recovery system that helps administrators recover from data corruption caused by bugs in web applications is described and the results show that the system enables recovery from data Corruption without loss of critical data and incurs small runtime overhead.

...read moreread less

Abstract: Web-based applications store their data at the server side. This design has several benefits, but it can also cause a serious problem because a misconfiguration, bug or vulnerability leading to data loss or corruption can affect many users. While data backup solutions can help resolve some of these issues, they do not help diagnose the events that led to the corruption or the precise set of changes caused by these events. In this paper, we describe the design of a recovery system that helps administrators recover from data corruption caused by bugs in web applications. Our system tracks application requests, helping identify requests that cause data corruption, and reuses undo logs already kept by databases to selectively recover from the effects of these requests. The main challenge is to correlate requests across the multiple tiers of the application to determine the correct recovery actions. We explore using dependencies both within and across requests at three layers (database, application, and client) to help identify data corruption accurately. We evaluate our system using known bugs in popular web applications, including Wordpress, Drupal and Gallery2. Our results show that our system enables recovery from data corruption without loss of critical data and incurs small runtime overhead.

...read moreread less

18 citations

Collapse

Network Information

Performance

Metrics

435

Papers

7,411

Citations

No. of papers in the topic in previous years
Year	Papers
2022	1
2021	21
2020	25
2019	27
2018	27
2017	27

Data Corruption

Papers published on a yearly basis

Papers

Trending Questions (3)

Network Information

Related Topics (5)

Performance

Metrics