scispace - formally typeset
Search or ask a question
Topic

Data Corruption

About: Data Corruption is a research topic. Over the lifetime, 435 publications have been published within this topic receiving 6784 citations.


Papers
More filters
Proceedings ArticleDOI
11 Jun 2018
TL;DR: A systematic examination of a large set of data transfer log data to characterize transfer characteristics, including the nature of the datasets transferred, achieved throughput, user behavior, and resource usage yields new insights that can help design better data transfer tools, optimize networking and edge resources used for transfers, and improve the performance and experience for end users.
Abstract: Wide area data transfers play an important role in many science applications but rely on expensive infrastructure that often delivers disappointing performance in practice. In response, we present a systematic examination of a large set of data transfer log data to characterize transfer characteristics, including the nature of the datasets transferred, achieved throughput, user behavior, and resource usage. This analysis yields new insights that can help design better data transfer tools, optimize networking and edge resources used for transfers, and improve the performance and experience for end users. Our analysis shows that (i) most of the datasets as well as individual files transferred are very small; (ii) data corruption is not negligible for large data transfers; and (iii) the data transfer nodes utilization is low. Insights gained from our analysis suggest directions for further analysis.

53 citations

Journal ArticleDOI
TL;DR: The causes of UDEs and their effects on data integrity are discussed, some of the basic techniques that have been applied to address this problem at various software layers in the I/O stack are described and a family of solutions that can be integrated into the RAID subsystem are described.
Abstract: Though remarkably reliable, disk drives do fail occasionally. Most failures can be detected immediately; moreover, such, failures can be modeled and addressed using technologies such as RAID (Redundant Arrays of Independent Disks). Unfortunately, disk drives can experience errors that are undetected by the drive-- which we refer to as undetected disk errors (UDEs). These errors can cause silent data corruption that may go completely undetected (until a system or application malfunction) or may be detected by software in the storage I/O stack. Continual increases in disk densities or in storage array sizes and more significantly the introduction of desktop-class drives in enterprise storage systems are increasing the likelihood of UDEs in a given system. Therefore, the incorporation of UDE detection (and correction) into storage systems is necessary to prevent increasing numbers of data corruption and data loss events. In this paper, we discuss the causes of UDEs and their effects on data integrity. We describe some of the basic techniques that have been applied to address this problem at various software layers in the I/O stack and describe a family of solutions that can be integrated into the RAID subsystem.

51 citations

Journal ArticleDOI
TL;DR: This research presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and therefore expensive process of manually cataloging and reprograming DRAM modules for use in compute clusters.
Abstract: Errors in dynamic random access memory (DRAM) are a common form of hardware failure in modern compute clusters. Failures are costly both in terms of hardware replacement costs and service disruptio...

48 citations

Patent
03 Sep 1996
TL;DR: In this paper, the authors present a heuristic analysis of the methods of the present invention locate and identify buffers accessed within the captured state logic data and buffer descriptors accessed within a captured-state logic data despite the time dispersion thereof.
Abstract: Methods and associated apparatus for analyzing and presenting captured state logic data including memory accesses by an intelligent I/O interface device and an attached computer system. The data analysis and display of the present invention aids an engineer in locating data corruption failures in a system. The heuristic analysis of the methods of the present invention locate and identify buffers accessed within the captured state logic data and buffer descriptors accessed within the captured state logic data despite the time dispersion thereof. The buffers and buffer descriptors located and identified within the captured state logic data are displayed on a computer display screen in a manner to more effectively assist an engineer in locating a root cause of data corruption than was possible with prior methods devoid of the analysis of the present invention. In particular, the display visually identifies buffers regardless of the state/time dispersion in the original captured state logic data and distinguishes read access from write access thereto. The display includes indicia used to associate a located and identified buffer descriptor with the identified buffer to which it refers. In response to user requests, the data contained in a selected buffer or selected buffers may be textually displayed either in a raw form or in accordance with the protocol specifications of the underlying data exchange application being debugged. The identified buffers may also be easily searched for a user specified string without concern for the time dispersion of the buffers in the captured state logic data.

48 citations

Proceedings ArticleDOI
17 Feb 2014
TL;DR: ViewBox is presented, an integrated synchronization service and local file system that provides freedom from data corruption and inconsistency that detects and recovers from both Corruption and inconsistency, while incurring minimal overhead.
Abstract: Cloud-based file synchronization services have become enormously popular in recent years, both for their ability to synchronize files across multiple clients and for the automatic cloud backups they provide. However, despite the excellent reliability that the cloud back-end provides, the loose coupling of these services and the local file system makes synchronized data more vulnerable than users might believe. Local corruption may be propagated to the cloud, polluting all copies on other devices, and a crash or untimely shutdown may lead to inconsistency between a local file and its cloud copy. Even without these failures, these services cannot provide causal consistency.To address these problems, we present ViewBox, an integrated synchronization service and local file system that provides freedom from data corruption and inconsistency. ViewBox detects these problems using ext4-cksum, a modified version of ext4, and recovers from them using a user-level daemon, cloud helper, to fetch correct data from the cloud. To provide a stable basis for recovery, ViewBox employs the view manager on top of ext4-cksum. The view manager creates and exposes views, consistent in-memory snapshots of the file system, which the synchronization client then uploads. Our experiments show that ViewBox detects and recovers from both corruption and inconsistency, while incurring minimal overhead.

48 citations


Network Information
Related Topics (5)
Network packet
159.7K papers, 2.2M citations
82% related
Software
130.5K papers, 2M citations
81% related
Wireless sensor network
142K papers, 2.4M citations
78% related
Wireless network
122.5K papers, 2.1M citations
77% related
Cluster analysis
146.5K papers, 2.9M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202121
202025
201927
201827
201727