scispace - formally typeset
Search or ask a question
Topic

Data Corruption

About: Data Corruption is a research topic. Over the lifetime, 435 publications have been published within this topic receiving 6784 citations.


Papers
More filters
Proceedings ArticleDOI
10 Aug 2008
TL;DR: This work presents a generic and elegant approach by using a highly fault secure algebraic structure that is compatible to finite fields and rings and preserves its error detection property throughout addition and multiplication.
Abstract: So far many software countermeasures against fault attacks have been proposed. However, most of them are tailored to a specific cryptographic algorithm or focus on securing the processed data only. In this work we present a generic and elegant approach by using a highly fault secure algebraic structure. This structure is compatible to finite fields and rings and preserves its error detection property throughout addition and multiplication. Additionally, we introduce a method to generate a fingerprint of the instruction sequence. Thus, it is possible to check the result for data corruption as well as for modifications in the program flow. This is even possible if the order of the instructions is randomized. Furthermore, the properties of the countermeasure allow the deployment of error detection as well as error diffusion. We point out that the overhead for the calculations and for the error checking within this structure is reasonable and that the transformations are efficient. In addition we discuss how our approach increases the security in various kinds of fault scenarios.

20 citations

28 Jul 2017
TL;DR: Potential approaches to the storage and querying of Linked Data with varying degrees of decentralisation and guarantees of integrity, using distributed ledgers are described, and their a priori differences in performance, storage limitations and reliability are discussed.
Abstract: Distributed ledger platforms based on blockchains provide a fully distributed form of data storage which can guarantee data integrity Certain use cases, such as medical applications, can benefit from guarantees that the results of arbitrary queries against a Linked Dataset faithfully represent its contents as originally published, without tampering or data corruption We describe potential approaches to the storage and querying of Linked Data with varying degrees of decentralisation and guarantees of integrity, using distributed ledgers, and discuss their a priori differences in performance, storage limitations and reliability, setting out a programme for future empirical research

19 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: In this paper, file-level and block-level (with various block sizes) pipelining is proposed to overlap data transfer and checksum computation in GridFTP, which can improve the overall data transfer time with end-to-end integrity verification.
Abstract: The scale of scientific data generated by experimental facilities and simulations on high-performance computing facilities has been growing rapidly. In many cases, this data needs to be transferred rapidly and reliably to remote facilities for storage, analysis, sharing etc. At the same time, users want to verify the integrity of the data by doing a checksum after the data has been written to disk at the destination, to ensure the file has not been corrupted, for example due to network or storage data corruption, software bugs or human error. This end-to-end integrity verification creates additional overhead (extra disk I/O and more computation) and increases the overall data transfer time. In this paper, we evaluate strategies to maximize the overlap between data transfer and checksum computation. More specifically, we evaluate file-level and block-level (with various block sizes) pipelining to overlap data transfer and checksum computation. We evaluate these pipelining approaches in the context of GridFTP, a widely used protocol for science data transfers. We conducted both theoretical analysis and real experiments to evaluate our methods. The results show that block-level pipelining is an effective method in maximizing the overlap between data transfer and checksum computation and can improve the overall data transfer time with end-to-end integrity verification by up to 70% compared to the sequential execution of transfer and checksum, and by up to 60% compared to file-level pipelining.

19 citations

Journal ArticleDOI
01 Aug 2017
TL;DR: This paper addresses the coverage and efficiency problems of data cleaning by introducing CleanM (pronounced clean'em), a language which can express multiple types of cleaning operations and validated the applicability of CleanM on top of CleanDB, a newly designed and implemented framework which can query heterogeneous data.
Abstract: Data cleaning has become an indispensable part of data analysis due to the increasing amount of dirty data. Data scientists spend most of their time preparing dirty data before it can be used for data analysis. At the same time, the existing tools that attempt to automate the data cleaning procedure typically focus on a specific use case and operation. Still, even such specialized tools exhibit long running times or fail to process large datasets. Therefore, from a user's perspective, one is forced to use a different, potentially inefficient tool for each category of errors.This paper addresses the coverage and efficiency problems of data cleaning. It introduces CleanM (pronounced clean'em), a language which can express multiple types of cleaning operations. CleanM goes through a three-level translation process for optimization purposes; a different family of optimizations is applied in each abstraction level. Thus, CleanM can express complex data cleaning tasks, optimize them in a unified way, and deploy them in a scaleout fashion. We validate the applicability of CleanM by using it on top of CleanDB, a newly designed and implemented framework which can query heterogeneous data. When compared to existing data cleaning solutions, CleanDB a) covers more data corruption cases, b) scales better, and can handle cases for which its competitors are unable to terminate, and c) uses a single interface for querying and for data cleaning.

19 citations

Proceedings ArticleDOI
09 Aug 2010
TL;DR: The design of a recovery system that helps administrators recover from data corruption caused by bugs in web applications is described and the results show that the system enables recovery from data Corruption without loss of critical data and incurs small runtime overhead.
Abstract: Web-based applications store their data at the server side. This design has several benefits, but it can also cause a serious problem because a misconfiguration, bug or vulnerability leading to data loss or corruption can affect many users. While data backup solutions can help resolve some of these issues, they do not help diagnose the events that led to the corruption or the precise set of changes caused by these events. In this paper, we describe the design of a recovery system that helps administrators recover from data corruption caused by bugs in web applications. Our system tracks application requests, helping identify requests that cause data corruption, and reuses undo logs already kept by databases to selectively recover from the effects of these requests. The main challenge is to correlate requests across the multiple tiers of the application to determine the correct recovery actions. We explore using dependencies both within and across requests at three layers (database, application, and client) to help identify data corruption accurately. We evaluate our system using known bugs in popular web applications, including Wordpress, Drupal and Gallery2. Our results show that our system enables recovery from data corruption without loss of critical data and incurs small runtime overhead.

18 citations


Network Information
Related Topics (5)
Network packet
159.7K papers, 2.2M citations
82% related
Software
130.5K papers, 2M citations
81% related
Wireless sensor network
142K papers, 2.4M citations
78% related
Wireless network
122.5K papers, 2.1M citations
77% related
Cluster analysis
146.5K papers, 2.9M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202121
202025
201927
201827
201727