Showing papers on "Data Corruption published in 2008"

PDF

Open Access

Journal Article•DOI•

An analysis of data corruption in the storage stack

[...]

Lakshmi Narayanan Bairavasundaram¹, Andrea C. Arpaci-Dusseau¹, Remzi H. Arpaci-Dusseau¹, Garth R. Goodson, Bianca Schroeder² - Show less +1 more•Institutions (2)

University of Wisconsin-Madison¹, University of Toronto²

24 Nov 2008-ACM Transactions on Storage

TL;DR: This article presents the first large-scale study of data corruption, which analyzes corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months.

...read moreread less

Abstract: An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand its characteristics. In this article, we present the first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. We study three classes of corruption: checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches since they occur the most.We find more than 400,000 instances of checksum mismatches over the 41-month period. We find many interesting trends among these instances, including: (i) nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise-class disk drives, (ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality, and (iii) checksum mismatches across different disks in the same storage system are not independent. We use our observations to derive lessons for corruption-proof system design.

...read moreread less

312 citations

Proceedings Article•

Parity lost and parity regained

[...]

Andrew Krioukov¹, Lakshmi Narayanan Bairavasundaram¹, Garth R. Goodson, Kiran Srinivasan, Randy Thelen, Andrea C. Arpaci-Dusseau¹, Remzi H. Arpaci-Dussea¹ - Show less +3 more•Institutions (1)

University of Wisconsin-Madison¹

26 Feb 2008

TL;DR: This work uses model checking to evaluate whether common protection techniques used in parity-based RAID systems are sufficient in light of the increasingly complex failure modes of modern disk drives and identifies a parity pollution problem that spreads corrupt data across multiple disks, thus leading to data loss or corruption.

...read moreread less

Abstract: RAID storage systems protect data from storage errors, such as data corruption, using a set of one or more integrity techniques, such as checksums. The exact protection offered by certain techniques or a combination of techniques is sometimes unclear. We introduce and apply a formal method of analyzing the design of data protection strategies. Specifically, we use model checking to evaluate whether common protection techniques used in parity-based RAID systems are sufficient in light of the increasingly complex failure modes of modern disk drives. We evaluate the approaches taken by a number of real systems under single-error conditions, and find flaws in every scheme. In particular, we identify a parity pollution problem that spreads corrupt data (the result of a single error) across multiple disks, thus leading to data loss or corruption. We further identify which protection measures must be used to avoid such problems. Finally, we show how to combine real-world failure data with the results from the model checker to estimate the actual likelihood of data loss of different protection strategies.

...read moreread less

102 citations

Proceedings Article•DOI•

Security through redundant data diversity

[...]

Anh Nguyen-Tuong¹, David Evans¹, John C. Knight¹, Benjamin L. Cox¹, Jack W. Davidson¹ - Show less +1 more•Institutions (1)

University of Virginia¹

24 Jun 2008

TL;DR: A method for using data diversity in N-variant systems to provide high-assurance arguments against a class of data corruption attacks that thwarts attacks that corrupt UID values is developed.

...read moreread less

Abstract: Unlike other diversity-based approaches, N-variant systems thwart attacks without requiring secrets. Instead, they use redundancy (to require an attacker to simultaneously compromise multiple variants with the same input) and tailored diversity (to make it impossible to compromise all the variants with the same input for given attack classes). In this work, we develop a method for using data diversity in N-variant systems to provide high-assurance arguments against a class of data corruption attacks. Data is transformed in the variants so identical concrete data values have different interpretations. In order to corrupt the data without detection, an attacker would need to alter the corresponding data in each variant in a different way while sending the same inputs to all variants. We demonstrate our approach with a case study using that thwarts attacks that corrupt UID values.

...read moreread less

79 citations

Patent•

Data conflict resolution for solid-state memory devices

[...]

Stefanus Stefanus¹, Feng Shen¹, Wei Loon Ng¹•Institutions (1)

Seagate Technology¹

17 Jun 2008

TL;DR: In this article, a controller is disclosed that is adapted to control read/write access to a storage media, which includes data corruption detection logic to reconstruct a logical block address (LBA) lookup table from metadata stored at the storage media upon restart and re-initialization after a power loss event.

...read moreread less

Abstract: In a particular embodiment, a controller is disclosed that is adapted to control read/write access to a storage media. The controller includes data corruption detection logic to reconstruct a logical block address (LBA) lookup table from metadata stored at the storage media upon restart and re-initialization after a power loss event. The controller further includes duplicate conflict resolution logic to identify a valid data block from multiple data blocks that refer to a single LBA. The duplicate conflict resolution logic counts a first number of valid physical pages and a second number of different sectors in each of the multiple data blocks. The duplicate conflict resolution logic selects the valid data block from the multiple data blocks based on at least one of the first and second numbers.

...read moreread less

74 citations

Patent•

Apparatus and method for identifying disk drives with unreported data corruption

[...]

Jeffery Lawrence Shellhamer

31 Jul 2008

TL;DR: In this article, a RAID controller uses a method to identify a storage device of a redundant array of storage devices that is returning corrupt data to the RAID controller, and the method includes reading data from a location of each storage device in the redundant array a first time, and detecting that at least one storage device returned corrupt data.

...read moreread less

Abstract: A RAID controller uses a method to identify a storage device of a redundant array of storage devices that is returning corrupt data to the RAID controller. The method includes reading data from a location of each storage device in the redundant array a first time, and detecting that at least one storage device returned corrupt data. In response to detecting corrupt data, steps are performed for each storage device in the redundant array. The steps include reading data from the location of the storage device a second time without writing to the location in between the first and second reads, comparing the data read the first and second times, and identifying the storage device as a failing storage device if the compared data has a miscompare. Finally, the method includes updating the location of each storage device to a new location and repeating the steps for the new location.

...read moreread less

64 citations

Patent•

Data corruption diagnostic engine

[...]

Mark Dilman¹, Michael James Stewart¹, Wei-Ming Hu¹, Balasubrahmanyam Kuchibhotla¹, Margaret Susairai¹, Hubert Ken Sun¹ - Show less +2 more•Institutions (1)

Business International Corporation¹

17 Oct 2008

TL;DR: In this paper, a computer is programmed to execute a diagnostic procedure either on a pre-set schedule or asynchronously in response to an event, such as an error message, or a user command.

...read moreread less

Abstract: A computer is programmed to execute a diagnostic procedure either on a pre-set schedule or asynchronously in response to an event, such as an error message, or a user command. When executed, the diagnostic procedure automatically checks for integrity of one or more portions of data in the computer, to identify any failure(s). In some embodiments, the failure(s) may be displayed to a human, after revalidation to exclude any failure that no longer exists.

...read moreread less

54 citations

Journal Article•DOI•

Undetected disk errors in RAID arrays

[...]

J. L. Hafner¹, Veera W. Deenadhayalan¹, Wendy A. Belluomini¹, KK Rao¹•Institutions (1)

IBM¹

01 Jul 2008-Ibm Journal of Research and Development

TL;DR: The causes of UDEs and their effects on data integrity are discussed, some of the basic techniques that have been applied to address this problem at various software layers in the I/O stack are described and a family of solutions that can be integrated into the RAID subsystem are described.

...read moreread less

Abstract: Though remarkably reliable, disk drives do fail occasionally. Most failures can be detected immediately; moreover, such, failures can be modeled and addressed using technologies such as RAID (Redundant Arrays of Independent Disks). Unfortunately, disk drives can experience errors that are undetected by the drive-- which we refer to as undetected disk errors (UDEs). These errors can cause silent data corruption that may go completely undetected (until a system or application malfunction) or may be detected by software in the storage I/O stack. Continual increases in disk densities or in storage array sizes and more significantly the introduction of desktop-class drives in enterprise storage systems are increasing the likelihood of UDEs in a given system. Therefore, the incorporation of UDE detection (and correction) into storage systems is necessary to prevent increasing numbers of data corruption and data loss events. In this paper, we discuss the causes of UDEs and their effects on data integrity. We describe some of the basic techniques that have been applied to address this problem at various software layers in the I/O stack and describe a family of solutions that can be integrated into the RAID subsystem.

...read moreread less

51 citations

Patent•

Auditor assisted extraction and verification of client data returned from a storage provided while hiding client data from the auditor

[...]

Mehul A. Shah¹, Ram Swaminathan¹•Institutions (1)

Hewlett-Packard¹

01 Oct 2008

TL;DR: In this paper, an auditor is initialized with a verification data set that confirms that an initial version of a data set stored by the storage provider is intact, and the auditor determines whether the second version matches the initial version.

...read moreread less

Abstract: Various approaches for extracting client's data from a storage provider are presented. In one approach, an auditor is initialized with a verification data set that confirms that an initial version of a data set stored by the storage provider is intact. The auditor extracts a second version of the data set from the storage provider; the second version hides information specified by the data set from the auditor. The auditor determines whether the second version matches the initial version. The second version is returned to the client if the initial version matches the second version. The auditor is prevented from recovering the information specified by the data set using the state information, and the client need not store any state information related to the initial and second versions needed to recover the information specified by the data set. If the initial version does not match the second version, the auditor outputs data indicative of data corruption.

...read moreread less

35 citations

Proceedings Article•

An analysis of data corruption in the storage stack

[...]

Lakshmi Narayanan Bairavasundaram¹, Garth R. Goodson, Bianca Schroeder², Andrea C. Arpaci-Dusseau¹, Remzi H. Arpaci-Dussea¹ - Show less +1 more•Institutions (2)

University of Wisconsin-Madison¹, University of Toronto²

26 Feb 2008

TL;DR: The first large-scale study of data corruption is presented in this paper, where the authors analyzed more than 400,000 instances of checksum mismatches over a period of 41 months.

...read moreread less

Abstract: An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand its characteristics. In this paper, we present the first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. We study three classes of corruption: checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches since they occur the most. We find more than 400,000 instances of checksum mismatches over the 41-month period. We find many interesting trends among these instances including: (i) nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise class disk drives, (ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality, and (iii) checksum mismatches across different disks in the same storage system are not independent. We use our observations to derive lessons for corruption-proof system design.

...read moreread less

34 citations

Proceedings Article•

SWEEPER: an efficient disaster recovery point identification mechanism

[...]

Akshat Verma¹, Kaladhar Voruganti, Ramani R. Routray¹, Rohit Jain²•Institutions (2)

IBM¹, Yahoo!²

26 Feb 2008

TL;DR: The effectiveness of SWEEPER as a robust strategy to significantly reduce recovery time is established and system administrators are allowed to perform trade-offs between recovery time and data currentness.

...read moreread less

Abstract: Data corruption is one of the key problems that is on top of the radar screen of most CIOs. Continuous Data Protection (CDP) technologies help enterprises deal with data corruption by maintaining multiple versions of data and facilitating recovery by allowing an administrator restore to an earlier clean version of data. The aim of the recovery process after data corruption is to quickly traverse through the backup copies (old versions), and retrieve a clean copy of data. Currently, data recovery is an ad-hoc, time consuming and frustrating process with sequential brute force approaches, where recovery time is proportional to the number of backup copies examined and the time to check a backup copy for data corruption. In this paper, we present the design and implementation of SWEEPER architecture and backup copy selection algorithms that specifically tackle the problem of quickly and systematically identifying a good recovery point. We monitor various system events and generate checkpoint records that help in quickly identifying a clean backup copy. The SWEEPER methodology dynamically determines the selection algorithm based on user specified recovery time and recovery point objectives, and thus, allows system administrators to perform trade-offs between recovery time and data currentness. We have implemented our solution as part of a popular Storage Resource Manager product and evaluated SWEEPER under many diverse settings. Our study clearly establishes the effectiveness of SWEEPER as a robust strategy to significantly reduce recovery time.

...read moreread less

23 citations

Proceedings Article•DOI•

A Generic Fault Countermeasure Providing Data and Program Flow Integrity

[...]

Marcel Medwed¹, Jörn-Marc Schmidt¹•Institutions (1)

Graz University of Technology¹

10 Aug 2008

TL;DR: This work presents a generic and elegant approach by using a highly fault secure algebraic structure that is compatible to finite fields and rings and preserves its error detection property throughout addition and multiplication.

...read moreread less

Abstract: So far many software countermeasures against fault attacks have been proposed. However, most of them are tailored to a specific cryptographic algorithm or focus on securing the processed data only. In this work we present a generic and elegant approach by using a highly fault secure algebraic structure. This structure is compatible to finite fields and rings and preserves its error detection property throughout addition and multiplication. Additionally, we introduce a method to generate a fingerprint of the instruction sequence. Thus, it is possible to check the result for data corruption as well as for modifications in the program flow. This is even possible if the order of the instructions is randomized. Furthermore, the properties of the countermeasure allow the deployment of error detection as well as error diffusion. We point out that the overhead for the calculations and for the error checking within this structure is reasonable and that the transformations are efficient. In addition we discuss how our approach increases the security in various kinds of fault scenarios.

...read moreread less

Patent•

Duplicate address discovery and action

[...]

Robert Snively¹•Institutions (1)

Brocade Communications Systems¹

17 Nov 2008

TL;DR: A duplicate address discovery process detects duplicate MAC addresses or duplicate unique port identifiers within the network, alerts attached devices of the duplicates, and takes action to avoid data corruption that might be caused by such duplicate addresses as discussed by the authors.

...read moreread less

Abstract: A duplicate address discovery process detects duplicate MAC addresses or duplicate unique port identifiers within the network, alerts attached devices of the duplicates, and takes action to avoid data corruption that might be caused by such duplicate addresses.

...read moreread less

Patent•

Auditing data integrity

[...]

Mehul A. Shah¹, Ram Swaminathan¹, Robert Schreiber¹, Alan H. Karp¹•Institutions (1)

Hewlett-Packard¹

29 Sep 2008

TL;DR: In this paper, a data set is provided from a client to a storage provider, and the data sets are stored at a first storage arrangement by the storage provider and the auditor outputs data indicative of data corruption in response to determining that the data set stored at the first storage arrangements is corrupt.

...read moreread less

Abstract: Various approaches are described for auditing integrity of stored data. In one approach, a data set is provided from a client to a storage provider, and the data set is stored at a first storage arrangement by the storage provider. An auditor determines whether the data set stored at the first storage arrangement is corrupt without reliance on any part of the data set and any derivative of any part of the data set stored by the client. While the auditor is determining whether the data set stored at the first storage arrangement is corrupt, the auditor is prevented from being exposed to information specified by the data set. The auditor outputs data indicative of data corruption in response to determining that the data set stored at the first storage arrangement is corrupt.

...read moreread less

Patent•

Memory data management device

[...]

Sato Mio, Shirai Izuru

14 Feb 2008

TL;DR: In this article, a data management device extracts ID numbers of all pages to generate memory management information, manages the storage position of data identified by each ID number, and determines a writable area for writing new data by specifying continuous pages with ID numbers showing non-use.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To efficiently store data and backup data using a memory while improving the memory corruption resistance of a NAND flash memory. SOLUTION: When data is stored in the NAND flash memory 11, the data is stored by using each page in order. An identification number (ID number) is assigned to each data to be stored, and the identification number is inserted to the header of each data. The data management device extracts ID numbers of all pages to generate memory management information, manages the storage position of data identified by each ID number, and determines a writable area for writing new data by specifying continuous pages with ID numbers showing non-use. When data corruption is recognized by checksum of the headers, the pages are retroactively checked to find data of the same ID number stored previously, and data is restored by use of this data. COPYRIGHT: (C)2008,JPO&INPIT

...read moreread less

Patent•

Method And Device For Verifying Integrity Of Data Acquisition Pathways

[...]

Gord Hama

26 Mar 2008

TL;DR: In this article, the authors propose a method of verifying the integrity of data acquired from a device emulating a hard disk to a host computer over a data transfer pathway by comparing a characteristic of the data stored on the host computer with a corresponding characteristic of known data to determine whether data corruption has occurred during data transfer over the data transfer path.

...read moreread less

Abstract: Disclosed is a method of verifying the integrity of data acquired from a device emulating a hard disk to a host computer over a data transfer pathway. A storage medium containing known data is connected to the data transfer pathway, the storage medium capable of emulating a hard disk. The known data is transferred from the storage medium to the host computer over the data transfer pathway for storage on the host computer. A characteristic of the data stored on the host computer is compared with a corresponding characteristic of said known data to determine whether data corruption has occurred during data transfer over said data transfer pathway. The characteristic could be a hash code value, such as a Message-Digest 5 (MD5) or Secure Hash Algorithm (SHA) value.

...read moreread less

Patent•

Method and system for preventing corruption of hard disk drive file system

[...]

Koester William Charles, Dyson John Spencer

19 Feb 2008

TL;DR: In this article, the authors provide methods and apparatus for preventing data corruption on a storage device by integrating a journaling file system with a cache system to provide both a robust file system integrity and an efficient reading and writing mechanism.

...read moreread less

Abstract: The present principles provide methods and apparatus for preventing data corruption on a storage device by integrating a journaling file system with a cache system. To ensure journal accuracy with respect to data that is most likely to affect file system integrity, a method in accordance with an aspect of the present principles includes bypassing the cache when writing (412) such data to a main platter of a storage device. Furthermore, to ensure overall efficiency in reading and writing data, a method in accordance with an aspect of the present principles includes writing (420) to a cache, in addition to writing to the platter (428), data that has a relatively less damaging effect on file system integrity. Thus, aspects of the present principles optimally integrate a cache system with a journaling file system to provide both a robust file system integrity and an efficient reading and writing mechanism.

...read moreread less

Patent•

System and method for preventing data corruption after power failure

[...]

Atul Mukker¹•Institutions (1)

LSI Corporation¹

03 Dec 2008

TL;DR: In this paper, a system and method for preventing data corruption after power failure is described, which may include receiving at least one of a read command or a write command, storing information on an array of disk drives, and storing persistent information on a journaling drive.

...read moreread less

Abstract: A system and method for preventing data corruption after power failure is described. The system may include a host server, a disk array, a journaling disk, and/or a RAID controller. A method for preventing data corruption after power failure may include receiving at least one of a read command or a write command, storing information on an array of disk drives at least partially based on receiving the at least one of a read command or a write command, and storing persistent information on a journaling drive.

...read moreread less

Patent•

Discovery of duplicate address in a network by reviewing discovery frames received at a port

[...]

Robert Snively

17 Nov 2008

...read moreread less

Patent•

Method and Apparatus for Decreasing Shared Memory Data Corruption

[...]

Xin Hua Liu¹, Roger Michael Meli¹, Xiao Song Ran¹•Institutions (1)

IBM¹

17 Apr 2008

TL;DR: In this paper, a method for controlling access to a shared memory is proposed, which comprises the steps of: judging whether or not there is data corruption in one of said at least two headers; and copying the control information in any one of other headers to said one header if there are data corruption.

...read moreread less

Abstract: The shared memory includes a header section and a data section, wherein said header section includes at least two headers in which control information is stored. The method comprises the steps of: judging whether or not there is data corruption in one of said at least two headers; and copying the control information in any one of other headers to said one header if there is data corruption in said one header. A method for controlling access to a shared memory is also disclosed.

...read moreread less

Patent•

Method and equipment for reducing data error of shared memory

[...]

Xin Hua Liu¹, Xiao Song Ran¹•Institutions (1)

IBM¹

29 Oct 2008

TL;DR: In this article, a method and equipment used for reducing the data corruption of a public storage is described, where the public storage comprises a head section and a data section, and the head section comprises at least two heads which memorize control information.

...read moreread less

Abstract: The invention discloses a method and equipment used for reducing the data corruption of a public storage. The public storage comprises a head section and a data section; wherein, the head section comprises at least two heads which memorize control information; the method comprises the steps as follows: whether one head out of the at least two heads has data corruption or not is judged; if one head has the data corruption, the control information in any one head out of other heads is copied to the head. The invention also discloses a method used for access control of the public storage.

...read moreread less

Data corruption in the storage stack: a closer look

[...]

Lakshmi Narayanan Bairavasundaram, Garth R. Goodson, Bianca Schroeder, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau - Show less +1 more

01 Jan 2008

TL;DR: The results from the first large-scale field study of data corruption are presented, which analyzes over 400,000 corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months.

...read moreread less

Abstract: One of the biggest chanllegences in designing storage systems is providing the reliability and availability that users expect. A serious threat to reliability is silent data corruption (i.e., corruption not detected by the disk drive). In order to develop suitable protection mechanisms against corruption, it is essential to undernstand its characteristics. In this article, we present the results from the first large-scale field study of data corruption. We analyze over 400,000 corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months.

...read moreread less

Patent•

Decreasing shared memory data corruption

[...]

Xin Hua Liu¹, Roger Michael Meli¹, Xiao Song Ran¹•Institutions (1)

IBM¹

17 Apr 2008

TL;DR: In this article, a method for controlling access to a shared memory is proposed, which comprises the steps of: judging whether or not there is data corruption in one of said at least two headers; and copying the control information in any one of other headers to said one header if there are data corruption.

...read moreread less