scispace - formally typeset
Search or ask a question
Topic

Data Corruption

About: Data Corruption is a research topic. Over the lifetime, 435 publications have been published within this topic receiving 6784 citations.


Papers
More filters
Proceedings ArticleDOI
05 Mar 2011
TL;DR: Flikker exposes and leverages an interesting trade-off between energy consumption and hardware correctness, and shows that many applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application's final outcome.
Abstract: Energy has become a first-class design constraint in computer systems. Memory is a significant contributor to total system power. This paper introduces Flikker, an application-level technique to reduce refresh power in DRAM memories. Flikker enables developers to specify critical and non-critical data in programs and the runtime system allocates this data in separate parts of memory. The portion of memory containing critical data is refreshed at the regular refresh-rate, while the portion containing non-critical data is refreshed at substantially lower rates. This partitioning saves energy at the cost of a modest increase in data corruption in the non-critical data. Flikker thus exposes and leverages an interesting trade-off between energy consumption and hardware correctness. We show that many applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application's final outcome. We also find that Flikker can save between 20-25% of the power consumed by the memory sub-system in a mobile device, with negligible impact on application performance. Flikker is implemented almost entirely in software, and requires only modest changes to the hardware.

457 citations

Journal ArticleDOI
TL;DR: This article presents the first large-scale study of data corruption, which analyzes corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months.
Abstract: An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand its characteristics. In this article, we present the first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. We study three classes of corruption: checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches since they occur the most.We find more than 400,000 instances of checksum mismatches over the 41-month period. We find many interesting trends among these instances, including: (i) nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise-class disk drives, (ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality, and (iii) checksum mismatches across different disks in the same storage system are not independent. We use our observations to derive lessons for corruption-proof system design.

312 citations

Patent
12 May 2003
TL;DR: In this article, the quality of data stored in a memory system is assessed by different methods, and the memory system operates according to the assessed quality, and corrective actions can be implemented specifically on the poor quality data according to suitably chosen schedules.
Abstract: The quality of data stored in a memory system is assessed by different methods, and the memory system is operated according to the assessed quality. The data quality can be assessed during read operations. Subsequent use of an Error Correction Code can utilize the quality indications to detect and reconstruct the data with improved effectiveness. Alternatively, a statistics of data quality can be constructed and digital data values can be associated in a modified manner to prevent data corruption. In both cases the corrective actions can be implemented specifically on the poor quality data, according to suitably chosen schedules, and with improved effectiveness because of the knowledge provided by the quality indications. These methods can be especially useful in high-density memory systems constructed of multi-level storage memory cells.

281 citations

Patent
27 Jun 2006
TL;DR: In this article, the authors describe a unique way for multiple processes to operate in parallel using (e.g., reading, modifying, and writing to) the same shared data without causing corruption to the shared data.
Abstract: The present disclosure describes a unique way for each of multiple processes to operate in parallel using (e.g., reading, modifying, and writing to) the same shared data without causing corruption to the shared data. For example, each of multiple processes utilizes current and past data values associated with a global counter or clock for purposes of determining whether any shared variables used to produce a respective transaction outcome were modified (by another process) when executing a respective transaction. If a respective process detects that shared data used by respective process was modified during a transaction, the process can abort and retry the transaction rather than cause data corruption by storing locally maintained results associated with the transaction to a globally shared data space.

220 citations

Journal ArticleDOI
TL;DR: A data fusion algorithm which uses basic notions from traffic flow theory and is generic in the sense that it does not impose any restrictions on the way the data are structured in a temporal or spatial way, and results in accurate reconstructed traffic conditions and is robust to increasing degrees of data corruption.
Abstract: : Fusing freeway traffic data such as spot speeds and travel times from a variety of traffic sensors (loops, cameras, automated vehicle identification systems) into a coherent, consistent, and reliable picture of the prevailing traffic conditions (eg, speeds, flows) is a critical task in any off- or online traffic management or data archival system This task is challenging as such data differ in terms of spatial and temporal resolution, accuracy, reliability, and most importantly in terms of spatiotemporal semantics In this article, we propose a data fusion algorithm (the extended generalized Treiber-Helbing filter [the EGTF]) which, although heuristic in nature, uses basic notions from traffic flow theory and is generic in the sense that it does not impose any restrictions on the way the data are structured in a temporal or spatial way This implies that the data can stem from any data source, given they provide a means to distinguish between free flowing and congested traffic On the basis of (ground truth and sensor) data from a micro-simulation tool, we demonstrate that the EGTF method results in accurate reconstructed traffic conditions and is robust to increasing degrees of data corruption Further research should focus on validating the approach on real data The method can be straightforwardly implemented in any traffic data archiving system or application which requires consistent and coherent traffic data from traffic sensors as inputs

172 citations


Network Information
Related Topics (5)
Network packet
159.7K papers, 2.2M citations
82% related
Software
130.5K papers, 2M citations
81% related
Wireless sensor network
142K papers, 2.4M citations
78% related
Wireless network
122.5K papers, 2.1M citations
77% related
Cluster analysis
146.5K papers, 2.9M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202121
202025
201927
201827
201727