scispace - formally typeset
Search or ask a question

Showing papers on "Data Corruption published in 2008"


Journal ArticleDOI
TL;DR: This article presents the first large-scale study of data corruption, which analyzes corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months.
Abstract: An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand its characteristics. In this article, we present the first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. We study three classes of corruption: checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches since they occur the most.We find more than 400,000 instances of checksum mismatches over the 41-month period. We find many interesting trends among these instances, including: (i) nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise-class disk drives, (ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality, and (iii) checksum mismatches across different disks in the same storage system are not independent. We use our observations to derive lessons for corruption-proof system design.

312 citations


Proceedings Article
26 Feb 2008
TL;DR: This work uses model checking to evaluate whether common protection techniques used in parity-based RAID systems are sufficient in light of the increasingly complex failure modes of modern disk drives and identifies a parity pollution problem that spreads corrupt data across multiple disks, thus leading to data loss or corruption.
Abstract: RAID storage systems protect data from storage errors, such as data corruption, using a set of one or more integrity techniques, such as checksums. The exact protection offered by certain techniques or a combination of techniques is sometimes unclear. We introduce and apply a formal method of analyzing the design of data protection strategies. Specifically, we use model checking to evaluate whether common protection techniques used in parity-based RAID systems are sufficient in light of the increasingly complex failure modes of modern disk drives. We evaluate the approaches taken by a number of real systems under single-error conditions, and find flaws in every scheme. In particular, we identify a parity pollution problem that spreads corrupt data (the result of a single error) across multiple disks, thus leading to data loss or corruption. We further identify which protection measures must be used to avoid such problems. Finally, we show how to combine real-world failure data with the results from the model checker to estimate the actual likelihood of data loss of different protection strategies.

102 citations


Proceedings ArticleDOI
24 Jun 2008
TL;DR: A method for using data diversity in N-variant systems to provide high-assurance arguments against a class of data corruption attacks that thwarts attacks that corrupt UID values is developed.
Abstract: Unlike other diversity-based approaches, N-variant systems thwart attacks without requiring secrets. Instead, they use redundancy (to require an attacker to simultaneously compromise multiple variants with the same input) and tailored diversity (to make it impossible to compromise all the variants with the same input for given attack classes). In this work, we develop a method for using data diversity in N-variant systems to provide high-assurance arguments against a class of data corruption attacks. Data is transformed in the variants so identical concrete data values have different interpretations. In order to corrupt the data without detection, an attacker would need to alter the corresponding data in each variant in a different way while sending the same inputs to all variants. We demonstrate our approach with a case study using that thwarts attacks that corrupt UID values.

79 citations


Patent
17 Jun 2008
TL;DR: In this article, a controller is disclosed that is adapted to control read/write access to a storage media, which includes data corruption detection logic to reconstruct a logical block address (LBA) lookup table from metadata stored at the storage media upon restart and re-initialization after a power loss event.
Abstract: In a particular embodiment, a controller is disclosed that is adapted to control read/write access to a storage media. The controller includes data corruption detection logic to reconstruct a logical block address (LBA) lookup table from metadata stored at the storage media upon restart and re-initialization after a power loss event. The controller further includes duplicate conflict resolution logic to identify a valid data block from multiple data blocks that refer to a single LBA. The duplicate conflict resolution logic counts a first number of valid physical pages and a second number of different sectors in each of the multiple data blocks. The duplicate conflict resolution logic selects the valid data block from the multiple data blocks based on at least one of the first and second numbers.

74 citations


Patent
31 Jul 2008
TL;DR: In this article, a RAID controller uses a method to identify a storage device of a redundant array of storage devices that is returning corrupt data to the RAID controller, and the method includes reading data from a location of each storage device in the redundant array a first time, and detecting that at least one storage device returned corrupt data.
Abstract: A RAID controller uses a method to identify a storage device of a redundant array of storage devices that is returning corrupt data to the RAID controller. The method includes reading data from a location of each storage device in the redundant array a first time, and detecting that at least one storage device returned corrupt data. In response to detecting corrupt data, steps are performed for each storage device in the redundant array. The steps include reading data from the location of the storage device a second time without writing to the location in between the first and second reads, comparing the data read the first and second times, and identifying the storage device as a failing storage device if the compared data has a miscompare. Finally, the method includes updating the location of each storage device to a new location and repeating the steps for the new location.

64 citations


Patent
17 Oct 2008
TL;DR: In this paper, a computer is programmed to execute a diagnostic procedure either on a pre-set schedule or asynchronously in response to an event, such as an error message, or a user command.
Abstract: A computer is programmed to execute a diagnostic procedure either on a pre-set schedule or asynchronously in response to an event, such as an error message, or a user command. When executed, the diagnostic procedure automatically checks for integrity of one or more portions of data in the computer, to identify any failure(s). In some embodiments, the failure(s) may be displayed to a human, after revalidation to exclude any failure that no longer exists.

54 citations


Journal ArticleDOI
TL;DR: The causes of UDEs and their effects on data integrity are discussed, some of the basic techniques that have been applied to address this problem at various software layers in the I/O stack are described and a family of solutions that can be integrated into the RAID subsystem are described.
Abstract: Though remarkably reliable, disk drives do fail occasionally. Most failures can be detected immediately; moreover, such, failures can be modeled and addressed using technologies such as RAID (Redundant Arrays of Independent Disks). Unfortunately, disk drives can experience errors that are undetected by the drive-- which we refer to as undetected disk errors (UDEs). These errors can cause silent data corruption that may go completely undetected (until a system or application malfunction) or may be detected by software in the storage I/O stack. Continual increases in disk densities or in storage array sizes and more significantly the introduction of desktop-class drives in enterprise storage systems are increasing the likelihood of UDEs in a given system. Therefore, the incorporation of UDE detection (and correction) into storage systems is necessary to prevent increasing numbers of data corruption and data loss events. In this paper, we discuss the causes of UDEs and their effects on data integrity. We describe some of the basic techniques that have been applied to address this problem at various software layers in the I/O stack and describe a family of solutions that can be integrated into the RAID subsystem.

51 citations


Patent
01 Oct 2008
TL;DR: In this paper, an auditor is initialized with a verification data set that confirms that an initial version of a data set stored by the storage provider is intact, and the auditor determines whether the second version matches the initial version.
Abstract: Various approaches for extracting client's data from a storage provider are presented. In one approach, an auditor is initialized with a verification data set that confirms that an initial version of a data set stored by the storage provider is intact. The auditor extracts a second version of the data set from the storage provider; the second version hides information specified by the data set from the auditor. The auditor determines whether the second version matches the initial version. The second version is returned to the client if the initial version matches the second version. The auditor is prevented from recovering the information specified by the data set using the state information, and the client need not store any state information related to the initial and second versions needed to recover the information specified by the data set. If the initial version does not match the second version, the auditor outputs data indicative of data corruption.

35 citations


Proceedings Article
26 Feb 2008
TL;DR: The first large-scale study of data corruption is presented in this paper, where the authors analyzed more than 400,000 instances of checksum mismatches over a period of 41 months.
Abstract: An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand its characteristics. In this paper, we present the first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. We study three classes of corruption: checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches since they occur the most. We find more than 400,000 instances of checksum mismatches over the 41-month period. We find many interesting trends among these instances including: (i) nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise class disk drives, (ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality, and (iii) checksum mismatches across different disks in the same storage system are not independent. We use our observations to derive lessons for corruption-proof system design.

34 citations


Proceedings Article
26 Feb 2008
TL;DR: The effectiveness of SWEEPER as a robust strategy to significantly reduce recovery time is established and system administrators are allowed to perform trade-offs between recovery time and data currentness.
Abstract: Data corruption is one of the key problems that is on top of the radar screen of most CIOs. Continuous Data Protection (CDP) technologies help enterprises deal with data corruption by maintaining multiple versions of data and facilitating recovery by allowing an administrator restore to an earlier clean version of data. The aim of the recovery process after data corruption is to quickly traverse through the backup copies (old versions), and retrieve a clean copy of data. Currently, data recovery is an ad-hoc, time consuming and frustrating process with sequential brute force approaches, where recovery time is proportional to the number of backup copies examined and the time to check a backup copy for data corruption. In this paper, we present the design and implementation of SWEEPER architecture and backup copy selection algorithms that specifically tackle the problem of quickly and systematically identifying a good recovery point. We monitor various system events and generate checkpoint records that help in quickly identifying a clean backup copy. The SWEEPER methodology dynamically determines the selection algorithm based on user specified recovery time and recovery point objectives, and thus, allows system administrators to perform trade-offs between recovery time and data currentness. We have implemented our solution as part of a popular Storage Resource Manager product and evaluated SWEEPER under many diverse settings. Our study clearly establishes the effectiveness of SWEEPER as a robust strategy to significantly reduce recovery time.

23 citations


Proceedings ArticleDOI
10 Aug 2008
TL;DR: This work presents a generic and elegant approach by using a highly fault secure algebraic structure that is compatible to finite fields and rings and preserves its error detection property throughout addition and multiplication.
Abstract: So far many software countermeasures against fault attacks have been proposed. However, most of them are tailored to a specific cryptographic algorithm or focus on securing the processed data only. In this work we present a generic and elegant approach by using a highly fault secure algebraic structure. This structure is compatible to finite fields and rings and preserves its error detection property throughout addition and multiplication. Additionally, we introduce a method to generate a fingerprint of the instruction sequence. Thus, it is possible to check the result for data corruption as well as for modifications in the program flow. This is even possible if the order of the instructions is randomized. Furthermore, the properties of the countermeasure allow the deployment of error detection as well as error diffusion. We point out that the overhead for the calculations and for the error checking within this structure is reasonable and that the transformations are efficient. In addition we discuss how our approach increases the security in various kinds of fault scenarios.

Patent
17 Nov 2008
TL;DR: A duplicate address discovery process detects duplicate MAC addresses or duplicate unique port identifiers within the network, alerts attached devices of the duplicates, and takes action to avoid data corruption that might be caused by such duplicate addresses as discussed by the authors.
Abstract: A duplicate address discovery process detects duplicate MAC addresses or duplicate unique port identifiers within the network, alerts attached devices of the duplicates, and takes action to avoid data corruption that might be caused by such duplicate addresses.

Patent
29 Sep 2008
TL;DR: In this paper, a data set is provided from a client to a storage provider, and the data sets are stored at a first storage arrangement by the storage provider and the auditor outputs data indicative of data corruption in response to determining that the data set stored at the first storage arrangements is corrupt.
Abstract: Various approaches are described for auditing integrity of stored data. In one approach, a data set is provided from a client to a storage provider, and the data set is stored at a first storage arrangement by the storage provider. An auditor determines whether the data set stored at the first storage arrangement is corrupt without reliance on any part of the data set and any derivative of any part of the data set stored by the client. While the auditor is determining whether the data set stored at the first storage arrangement is corrupt, the auditor is prevented from being exposed to information specified by the data set. The auditor outputs data indicative of data corruption in response to determining that the data set stored at the first storage arrangement is corrupt.

Patent
14 Feb 2008
TL;DR: In this article, a data management device extracts ID numbers of all pages to generate memory management information, manages the storage position of data identified by each ID number, and determines a writable area for writing new data by specifying continuous pages with ID numbers showing non-use.
Abstract: PROBLEM TO BE SOLVED: To efficiently store data and backup data using a memory while improving the memory corruption resistance of a NAND flash memory. SOLUTION: When data is stored in the NAND flash memory 11, the data is stored by using each page in order. An identification number (ID number) is assigned to each data to be stored, and the identification number is inserted to the header of each data. The data management device extracts ID numbers of all pages to generate memory management information, manages the storage position of data identified by each ID number, and determines a writable area for writing new data by specifying continuous pages with ID numbers showing non-use. When data corruption is recognized by checksum of the headers, the pages are retroactively checked to find data of the same ID number stored previously, and data is restored by use of this data. COPYRIGHT: (C)2008,JPO&INPIT

Patent
26 Mar 2008
TL;DR: In this article, the authors propose a method of verifying the integrity of data acquired from a device emulating a hard disk to a host computer over a data transfer pathway by comparing a characteristic of the data stored on the host computer with a corresponding characteristic of known data to determine whether data corruption has occurred during data transfer over the data transfer path.
Abstract: Disclosed is a method of verifying the integrity of data acquired from a device emulating a hard disk to a host computer over a data transfer pathway. A storage medium containing known data is connected to the data transfer pathway, the storage medium capable of emulating a hard disk. The known data is transferred from the storage medium to the host computer over the data transfer pathway for storage on the host computer. A characteristic of the data stored on the host computer is compared with a corresponding characteristic of said known data to determine whether data corruption has occurred during data transfer over said data transfer pathway. The characteristic could be a hash code value, such as a Message-Digest 5 (MD5) or Secure Hash Algorithm (SHA) value.

Patent
19 Feb 2008
TL;DR: In this article, the authors provide methods and apparatus for preventing data corruption on a storage device by integrating a journaling file system with a cache system to provide both a robust file system integrity and an efficient reading and writing mechanism.
Abstract: The present principles provide methods and apparatus for preventing data corruption on a storage device by integrating a journaling file system with a cache system. To ensure journal accuracy with respect to data that is most likely to affect file system integrity, a method in accordance with an aspect of the present principles includes bypassing the cache when writing (412) such data to a main platter of a storage device. Furthermore, to ensure overall efficiency in reading and writing data, a method in accordance with an aspect of the present principles includes writing (420) to a cache, in addition to writing to the platter (428), data that has a relatively less damaging effect on file system integrity. Thus, aspects of the present principles optimally integrate a cache system with a journaling file system to provide both a robust file system integrity and an efficient reading and writing mechanism.

Patent
Atul Mukker1
03 Dec 2008
TL;DR: In this paper, a system and method for preventing data corruption after power failure is described, which may include receiving at least one of a read command or a write command, storing information on an array of disk drives, and storing persistent information on a journaling drive.
Abstract: A system and method for preventing data corruption after power failure is described. The system may include a host server, a disk array, a journaling disk, and/or a RAID controller. A method for preventing data corruption after power failure may include receiving at least one of a read command or a write command, storing information on an array of disk drives at least partially based on receiving the at least one of a read command or a write command, and storing persistent information on a journaling drive.

Patent
17 Nov 2008
TL;DR: A duplicate address discovery process detects duplicate MAC addresses or duplicate unique port identifiers within the network, alerts attached devices of the duplicates, and takes action to avoid data corruption that might be caused by such duplicate addresses.
Abstract: A duplicate address discovery process detects duplicate MAC addresses or duplicate unique port identifiers within the network, alerts attached devices of the duplicates, and takes action to avoid data corruption that might be caused by such duplicate addresses.

Patent
17 Apr 2008
TL;DR: In this paper, a method for controlling access to a shared memory is proposed, which comprises the steps of: judging whether or not there is data corruption in one of said at least two headers; and copying the control information in any one of other headers to said one header if there are data corruption.
Abstract: The shared memory includes a header section and a data section, wherein said header section includes at least two headers in which control information is stored. The method comprises the steps of: judging whether or not there is data corruption in one of said at least two headers; and copying the control information in any one of other headers to said one header if there is data corruption in said one header. A method for controlling access to a shared memory is also disclosed.

Patent
Xin Hua Liu1, Xiao Song Ran1
29 Oct 2008
TL;DR: In this article, a method and equipment used for reducing the data corruption of a public storage is described, where the public storage comprises a head section and a data section, and the head section comprises at least two heads which memorize control information.
Abstract: The invention discloses a method and equipment used for reducing the data corruption of a public storage. The public storage comprises a head section and a data section; wherein, the head section comprises at least two heads which memorize control information; the method comprises the steps as follows: whether one head out of the at least two heads has data corruption or not is judged; if one head has the data corruption, the control information in any one head out of other heads is copied to the head. The invention also discloses a method used for access control of the public storage.

01 Jan 2008
TL;DR: The results from the first large-scale field study of data corruption are presented, which analyzes over 400,000 corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months.
Abstract: One of the biggest chanllegences in designing storage systems is providing the reliability and availability that users expect. A serious threat to reliability is silent data corruption (i.e., corruption not detected by the disk drive). In order to develop suitable protection mechanisms against corruption, it is essential to undernstand its characteristics. In this article, we present the results from the first large-scale field study of data corruption. We analyze over 400,000 corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months.

Patent
17 Apr 2008
TL;DR: In this article, a method for controlling access to a shared memory is proposed, which comprises the steps of: judging whether or not there is data corruption in one of said at least two headers; and copying the control information in any one of other headers to said one header if there are data corruption.
Abstract: The shared memory includes a header section and a data section, wherein said header section includes at least two headers in which control information is stored. The method comprises the steps of: judging whether or not there is data corruption in one of said at least two headers; and copying the control information in any one of other headers to said one header if there is data corruption in said one header. A method for controlling access to a shared memory is also disclosed.