scispace - formally typeset
Search or ask a question
Topic

Data Corruption

About: Data Corruption is a research topic. Over the lifetime, 435 publications have been published within this topic receiving 6784 citations.


Papers
More filters
Proceedings ArticleDOI
27 May 2015
TL;DR: This work quantitatively analyzes the use of feral mechanisms for maintaining database integrity in a range of open source applications written using the Ruby on Rails ORM and finds that feral invariants are the most popular means of ensuring integrity.
Abstract: The rise of data-intensive "Web 2.0" Internet services has led to a range of popular new programming frameworks that collectively embody the latest incarnation of the vision of Object-Relational Mapping (ORM) systems, albeit at unprecedented scale. In this work, we empirically investigate modern ORM-backed applications' use and disuse of database concurrency control mechanisms. Specifically, we focus our study on the common use of feral, or application-level, mechanisms for maintaining database integrity, which, across a range of ORM systems, often take the form of declarative correctness criteria, or invariants. We quantitatively analyze the use of these mechanisms in a range of open source applications written using the Ruby on Rails ORM and find that feral invariants are the most popular means of ensuring integrity (and, by usage, are over 37 times more popular than transactions). We evaluate which of these feral invariants actually ensure integrity (by usage, up to 86.9%) and which---due to concurrency errors and lack of database support---may lead to data corruption (the remainder), which we experimentally quantify. In light of these findings, we present recommendations for database system designers for better supporting these modern ORM programming patterns, thus eliminating their adverse effects on application integrity.

66 citations

Patent
22 Jul 2002
TL;DR: In this paper, the authors present a system and method for testing integrity of data transmitted to and from a target device through a data connection, which generally includes creating one or more test threads and, for each test thread, generating a data load on the data connection by repetitively writing test data patterns to the target device and reading data patterns from the target devices using a synchronous I/O dispatch method.
Abstract: Embodiments of the present invention generally provide a system and method for testing integrity of data transmitted to and from a target device through a data connection. In one embodiment, the method generally includes creating one or more test threads. The method further includes, for each test thread, generating a data load on the data connection by repetitively writing test data patterns to the target device and reading data patterns from the target device using a synchronous I/O dispatch method, measuring data throughput to and from the target device while generating the data load, comparing the data patterns read from the target device to the test data patterns to detect data corruptions. The method may further include generating debug information if a data corruption is detected by one of the test threads.

64 citations

Patent
31 Jul 2008
TL;DR: In this article, a RAID controller uses a method to identify a storage device of a redundant array of storage devices that is returning corrupt data to the RAID controller, and the method includes reading data from a location of each storage device in the redundant array a first time, and detecting that at least one storage device returned corrupt data.
Abstract: A RAID controller uses a method to identify a storage device of a redundant array of storage devices that is returning corrupt data to the RAID controller. The method includes reading data from a location of each storage device in the redundant array a first time, and detecting that at least one storage device returned corrupt data. In response to detecting corrupt data, steps are performed for each storage device in the redundant array. The steps include reading data from the location of the storage device a second time without writing to the location in between the first and second reads, comparing the data read the first and second times, and identifying the storage device as a failing storage device if the compared data has a miscompare. Finally, the method includes updating the location of each storage device to a new location and repeating the steps for the new location.

64 citations

Proceedings ArticleDOI
18 Jun 2012
TL;DR: This work proposes the first RDC schemes that provide robustness and, at the same time, support dynamic updates, while requiring small, constant, client storage, and overcomes the drawback of a high communication cost for updates.
Abstract: Remote Data Checking (RDC) allows clients to efficiently check the integrity of data stored at untrusted servers. This allows data owners to assess the risk of outsourcing data in the cloud, making RDC a valuable tool for data auditing. A robust RDC scheme incorporates mechanisms to mitigate arbitrary amounts of data corruption. In particular, protection against small corruptions (i.e., bytes or even bits) ensures that attacks that modify a few bits do not destroy an encrypted file or invalidate authentication information. Early RDC schemes have focused on static data, whereas later schemes such as DPDP support the full range of dynamic operations on the outsourced data, including insertions, modifications, and deletions. Robustness is required for both static and dynamic RDC schemes that rely on spot checking for efficiency. However, under an adversarial setting there is a fundamental tension between efficient dynamic updates and the encoding required to achieve robustness, because updating even a small portion of the file may require retrieving the entire file. We identify the challenges that need to be overcome when trying to add robustness to a DPDP scheme. We propose the first RDC schemes that provide robustness and, at the same time, support dynamic updates, while requiring small, constant, client storage. Our first construction is efficient in encoding, but has a high communication cost for updates. Our second construction overcomes this drawback through a combination of techniques that includes RS codes based on Cauchy matrices, decoupling the encoding for robustness from the position of symbols in the file, and reducing insert/delete operations to append/modify operations when updating the RS-encoded parity data.

62 citations

Journal ArticleDOI
TL;DR: This work introduces the blockchain to record the interactions among users, service providers, and organizers in data auditing process as evidence, and employs the smart contract to detect service dispute, so as to enforce the untrusted organizer to honestly identify malicious service providers.
Abstract: Network storage services have benefited countless users worldwide due to the notable features of convenience, economy and high availability. Since a single service provider is not always reliable enough, more complex multi-cloud storage systems are developed for mitigating the data corruption risk. While a data auditing scheme is still needed in multi-cloud storage to help users confirm the integrity of their outsourced data. Unfortunately, most of the corresponding schemes rely on trusted institutions such as the centralized third-party auditor (TPA) and the cloud service organizer, and it is difficult to identify malicious service providers after service disputes. Therefore, we present a blockchain-based multi-cloud storage data auditing scheme to protect data integrity and accurately arbitrate service disputes. We not only introduce the blockchain to record the interactions among users, service providers, and organizers in data auditing process as evidence, but also employ the smart contract to detect service dispute, so as to enforce the untrusted organizer to honestly identify malicious service providers. We also use the blockchain network and homomorphic verifiable tags to achieve the low-cost batch verification without TPA. Theoretical analyses and experiments reveal that the scheme is effective in multi-cloud environments and the cost is acceptable.

58 citations


Network Information
Related Topics (5)
Network packet
159.7K papers, 2.2M citations
82% related
Software
130.5K papers, 2M citations
81% related
Wireless sensor network
142K papers, 2.4M citations
78% related
Wireless network
122.5K papers, 2.1M citations
77% related
Cluster analysis
146.5K papers, 2.9M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202121
202025
201927
201827
201727