scispace - formally typeset
Search or ask a question

Showing papers on "Data Corruption published in 2013"


Journal ArticleDOI
TL;DR: Cognizance of the causes of imperfect social network data, the importance of proper boundary specification, biases introduced via the employed data collection methods, and characteristics of social network information sources are necessary for SNA analysts to ascertain the resultant social network model's limitations and the inferences that can properly be drawn from the analysis.
Abstract: Social network analysis (SNA) conclusions are drawn on terrorist and dark network data sets that may provide erroneous results due to an indeterminate amount of missing data or data corruption. Compounding these effects, information sources reporting on terrorist groups and other dark network organizations may intentionally or unintentionally provide false data. These introduced errors may be significant as they could produce analytic results that are counter to the true situation, leading to misappropriation of resources, improper strategy adoption, and erroneous actions. Analyst cognizance of the causes of imperfect social network data, the importance of proper boundary specification, biases introduced via the employed data collection methods, and characteristics of social network information sources, particularly inherent informant accuracy assumptions, are necessary for SNA analysts to ascertain the resultant social network model's limitations and the inferences that can properly be drawn from the ana...

16 citations


Patent
Yan Li1
25 Jan 2013
TL;DR: In this paper, a system and methods for programming a set of data onto non-volatile memory elements, maintaining copies of the data pages to be programmed, as well as surrounding data pages, internally or externally to the memory circuit, verifying programming correctness after programming, and upon discovering programming error, recovering the safe copies of corrupted data to be reprogrammed in alternative nonvolatile memories elements.
Abstract: A system and methods for programming a set of data onto non-volatile memory elements, maintaining copies of the data pages to be programmed, as well as surrounding data pages, internally or externally to the memory circuit, verifying programming correctness after programming, and upon discovering programming error, recovering the safe copies of the corrupted data to be reprogrammed in alternative non-volatile memory elements. Additionally, a system and methods for programming one or more sets of data across multiple die of a non-volatile memory system, combining data pages across the multiple die by means such as the XOR operation prior to programming the one or more sets of data, employing various methods to determine the correctness of programming, and upon identifying data corruption, recovering safe copies of data pages by means such as XOR operation to reprogram the pages in an alternate location on the non-volatile memory system.

15 citations


Book ChapterDOI
05 Feb 2013
TL;DR: This paper will look at two distinct but related scenarios: migration to archival - leveraging on existing replicated data to create an erasure encoded archive, and data insertion - new data being inserted in the system directly in erasure coded format.
Abstract: Given the vast volume of data that needs to be stored reliably, many data-centers and large-scale file systems have started using erasure codes to achieve reliable storage while keeping the storage overhead low. This has invigorated the research on erasure codes tailor made to achieve different desirable storage system properties such as efficient redundancy replenishment mechanisms, resilience against data corruption, degraded reads, to name a few prominent ones. A problem that has mainly been overlooked until recently is that of how the storage system can be efficiently populated with erasure coded data to start with. In this paper, we will look at two distinct but related scenarios: (i) migration to archival - leveraging on existing replicated data to create an erasure encoded archive, and (ii) data insertion - new data being inserted in the system directly in erasure coded format. We will elaborate on coding techniques to achieve better throughput for data insertion and migration, and in doing so, explore the connection of these techniques with recently proposed locally repairable codes such as self-repairing codes.

13 citations


Patent
01 May 2013
TL;DR: In this paper, a hot-swap data storage device is released to be manually physically removed from the operable position within the chassis bay of the computer system in response to determining that the data storage devices are not active.
Abstract: A method and computer program product secure a hot-swap data storage device against being manually physically removed from an operable position within a chassis bay of a computer system. The hot-swap data storage device is released to be manually physically removed from the operable position within the chassis bay of the computer system in response to determining that the data storage device is not active. The hot-swap data storage device may, for example, be secured and released using an electronically-actuated lock.

10 citations


Journal ArticleDOI
TL;DR: This paper develops a strategy based on Sabattini et al., Decentralized Connectivity Maintenance for Networked Lagrangian Dynamical Systems, 2012 for guaranteeing connectivity in the presence of data corruption.

9 citations


Proceedings ArticleDOI
29 Apr 2013
TL;DR: This paper proposes an effective and flexible distributed scheme with explicit dynamic data support, including block update, delete, and append, and relies on erasure-correcting code in the file distribution preparation to provide redundancy parity vectors and guarantee the data dependability.
Abstract: In this paper, we investigate the problem of data security in cloud data storage, which is essentially a distributed storage system. To achieve the assurances of cloud data integrity and availability and enforce the quality of dependable cloud storage service for users, we propose an effective and flexible distributed scheme with explicit dynamic data support, including block update, delete, and append. We rely on erasure-correcting code in the file distribution preparation to provide redundancy parity vectors and guarantee the data dependability. By utilizing the token with distributed verification of coded data, our scheme achieves the integration of storage correctness insurance and data error localization, i.e., whenever data corruption has been detected during the storage correctness verification across the distributed servers, we can almost guarantee the simultaneous identification of the misbehaving server(s). Considering the time, computation resources, and even the related online burden of users, we also provide the extension of the proposed main scheme to support third-party auditing, where users can safely delegate the integrity checking tasks to third-party auditors and be worry-free to use the cloud storage services. Through detailed security and extensive experiment results, we show that our scheme is highly efficient and resilient to Byzantine failure, malicious data modification attack, and even server colluding attacks.

9 citations


Patent
18 Jun 2013
TL;DR: In this article, a method of operating a data processing system comprises: processing data words and switching between contexts; assigning a context signature Sig to any pair formed of a data word and a context; reading, within a current context, a data record from a memory unit, the data record comprising a payload dataword and a protection signature; providing, as a verification signature, the context signature of the payload data word, and checking the verification signature against the protection signature.
Abstract: A method of operating a data processing system comprises: processing data words and switching between contexts; assigning a context signature Sig to any pair formed of a data word and a context; reading, within a current context, a data record from a memory unit, the data record comprising a payload data word and a protection signature; providing, as a verification signature, the context signature Sig of the payload data word and the current context; checking the verification signature against the protection signature; and generating an error signal if the verification signature differs from the protection signature.

9 citations


Proceedings ArticleDOI
03 Nov 2013
TL;DR: This paper discusses the problem of hardening existing code bases of distributed systems transparently, and identifies and discusses three areas of improvement: reducing the memory overhead, improving access to state variables, and supporting multi-threading.
Abstract: In distributed systems, errors such as data corruption or arbitrary changes to the flow of programs might cause processes to propagate incorrect state across the system. To prevent error propagation in such systems, an efficient and effective technique is to harden processes against Arbitrary State Corruption (ASC) faults through local detection, without replication. For distributed systems designed from scratch, dealing with state corruption can be made fully transparent, but requires that developers follow a few concrete design patterns. In this paper, we discuss the problem of hardening existing code bases of distributed systems transparently. Existing systems have not been designed with ASC hardening in mind, so they do not necessarily follow required design patterns. For such systems, we focus here on both performance and number of changes to the existing code base. Using memcached as an example, we identify and discuss three areas of improvement: reducing the memory overhead, improving access to state variables, and supporting multi-threading. Our initial evaluation of memcached shows that our ASC-hardened version obtains a throughput that is roughly 76% of the throughput of stock memcached with 128-byte and 1k-byte messages.

8 citations


01 Jan 2013
TL;DR: Whenever data corruption has been detected during the storage correctness verification, the scheme can almost guarantee the simultaneous localization of data errors, i.e., the identification of the misbehaving server(s).
Abstract: Cloud computing is the newest term for the ongoing-dreamed vision of computing as a utility. The cloud provides convenient, on-demand network access to a centralized pool of configurable computing resources that can be rapidly deployed with great efficiency and minimal management overhead. The industry leaders and customers have wide-ranging expectations for cloud computing in which security concerns remain a major aspect Dealing with “single cloud” providers is becoming less popular with customers due to potential problems such as service availability failure and the possibility that there are malicious insiders in the single cloud. In recent years, there has been a move towards “multiclouds”, “intercloud” or “cloud-of-clouds”. The proposed design allows users to audit the cloud storage with very light weight communication and computation cost. Our scheme achieves the storage correctness insurance as well as data error localization: whenever data corruption has been detected during the storage correctness verification, our scheme can almost guarantee the simultaneous localization of data errors, i.e., the identification of the misbehaving server(s).

6 citations


01 Jan 2013
TL;DR: The benefit of control systems with network architecture over traditional systems with a central processor is described and a suitable standard protocol, CAN, is briefly presented and its current and future use in automobile machines is discussed.
Abstract: This paper describes the benefit of control systems with network architecture over traditional systems with a central processor. A suitable standard protocol, CAN, is briefly presented and its current and future use in automobile machines is discussed. An important task is to find a way to make it possible to use standard network modules from different producers in a network specially designed for a specific machine. A solution to this problem is the design rules "CAN Kingdom" and the basics for this are presented. This model consists of the collection and display of data which are independent of each other and remotely executed. The data collection module receives data through CAN bus, and the data display module display these data via GUI designed by Qt/Embedded in Microcontroller. Temperature sensor and IR sensors are used as data collection agents and LCD and motor is used as output agents for an application here to demonstrate the effectiveness of CAN. To avoid data corruption, redundancy and reduce the complexity of circuit, MCP2510 is used for extending CAN. The whole CAN bus system is made up of the MCP2510 which is a stand-alone CAN controller with SPI Interface and the MCP2551 which acts as an interface between the CAN controller and the physical bus which carries data. MCP2510 is capable of both acceptance filtering and message management. It includes three transmit buffers and two receive buffers that reduce the amount of microcontroller (MCU) management required. The MCU communication is implemented via an industry standard Serial Peripheral Interface (SPI) with data rates up to 5 Mb/s.

6 citations


Journal ArticleDOI
TL;DR: The proposed approach stores a modified duplicate of the narrow value such that errors on the original value and the duplicate can be distinguished and therefore corrected and the scheme is significantly faster than a parity check and can improve substantially the number of soft errors that are corrected compared to existing techniques.
Abstract: Soft errors are transient errors that can alter the logic value of a register bit causing data corruption. They can be caused by radiation particles such as neutrons or alpha particles. Narrow values are commonly found in the data consumed or produced by processors. Several techniques have recently been proposed to exploit the unused bits in narrow values to protect them against soft errors. These techniques replicate the narrow value over the unused register bits such that errors can be detected when the value is duplicated and corrected when the value is tripled. In this letter, a technique that can correct errors when the narrow value is only duplicated is presented. The proposed approach stores a modified duplicate of the narrow value such that errors on the original value and the duplicate can be distinguished and therefore corrected. The scheme has been implemented at the circuit level to evaluate its speed and also at the architectural level to assess the benefits in correcting soft errors. The results show that the scheme is significantly faster than a parity check and can improve substantially the number of soft errors that are corrected compared to existing techniques.

Patent
Yan Li1
25 Jan 2013
TL;DR: In this article, a system and methods for programming a set of data onto non-volatile memory elements, maintaining copies of the data pages to be programmed, as well as surrounding data pages, internally or externally to the memory circuit, verifying programming correctness after programming, and upon discovering programming error, recovering the safe copies of corrupted data to be reprogrammed in alternative nonvolatile memories elements.
Abstract: A system and methods for programming a set of data onto non-volatile memory elements, maintaining copies of the data pages to be programmed, as well as surrounding data pages, internally or externally to the memory circuit, verifying programming correctness after programming, and upon discovering programming error, recovering the safe copies of the corrupted data to be reprogrammed in alternative non-volatile memory elements. Additionally, a system and methods for programming one or more sets of data across multiple die of a non-volatile memory system, combining data pages across the multiple die by means such as the XOR operation prior to programming the one or more sets of data, employing various methods to determine the correctness of programming, and upon identifying data corruption, recovering safe copies of data pages by means such as XOR operation to reprogram the pages in an alternate location on the non-volatile memory system.

Patent
04 Feb 2013
TL;DR: In this paper, the authors propose to prevent the occurrence of data corruption when the forwarding-source region and the forwardingdestination region of the data overlap, and even when forwarding using a burst-forwarding function.
Abstract: The purpose of the present invention is to prevent the occurrence of data corruption when the forwarding-source region and the forwarding-destination region of the data overlap, and even when forwarding using a burst-forwarding function. Data read from the forwarding-source region is first written in a ring buffer, and then the data written in the ring buffer is written in the forwarding-destination region. When doing so, the reading of data from the ring buffer is controlled on the basis of the magnitude correlation between the number of wraparounds caused by the writing of data in the ring buffer, and the number of wraparounds caused by reading the data.

Proceedings ArticleDOI
01 Apr 2013
TL;DR: This paper considers two reliability threats: memory errors, where bits in DRAM are flipped due to cosmic rays, and software bugs, where programming errors may ultimately result in data corruption and crashes, and argues that by making use of checksums, it can significantly reduce the probability that either threat results in any application-visible effects.
Abstract: In this paper, we aim to improve the reliability of a central part of the operating system storage stack: the page cache. We consider two reliability threats: memory errors, where bits in DRAM are flipped due to cosmic rays, and software bugs, where programming errors may ultimately result in data corruption and crashes. We argue that by making use of checksums, we can significantly reduce the probability that either threat results in any application-visible effects. In particular, we can use checksums to detect memory corruption as well as validate the integrity of the cache's internal state for recovery after a crash. We show that in many cases, we can avoid the overhead of computing checksums especially for these purposes. We implement our ideas in the Loris storage stack. Our analysis and evaluation show that our approach improves the overall reliability of the cache at relatively little added cost.

Proceedings ArticleDOI
07 Dec 2013
TL;DR: This work presents a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications.
Abstract: The abundance and value of mining large time series data sets has long been acknowledged. Ubiquitous in fields ranging from astronomy, biology and web science the size and number of these datasets continues to increase, a situation exacerbated by the exponential growth of our digital footprints. The prevalence and potential utility of this data has led to a vast number of time-series data mining techniques, many of which require symbolization of the raw time series as a pre-processing step for which a number of well used, pre-existing approaches from the literature are typically employed. In this work we note that these standard approaches are sub-optimal in (at least) the broad application area of time series comparison leading to unnecessary data corruption and potential performance loss before any real data mining takes place. Addressing this we present a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets. We demonstrate empirically that our new approach provides a statistically significant reduction in the amount of error introduced by the symbolization process compared to current state-of-the-art. The approach therefore provides a more accurate input for the vast number of data mining techniques in the literature, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications.

01 Jan 2013
TL;DR: Precompute the token for large number of data will improve the performance of error detection in storage services and is used in this paper to identify the error corruption.
Abstract: Cloud Structure delivers infrastructure, platform and software as a service which are made available in pay-as-you go model. Cloud storage is a model of networked online storage where data is stored in virtualized pools of storage which are generally hosted by third parties. Users interact with the cloud servers through Cloud Storage Providers (CSP) to access or retrieve the data. Users have no time to monitor the data online. Hence, it entrust the auditing task to an optional Third Party Auditor (TPA). Its main aim is to achieve the integration of storage correctness across multiple servers and data error localization. Homomorphic pre computation is used in this paper to identify the error corruption. User precomputes the token for the data file; server computes signature over specified blocks. If the signature is not matched with the precomputed token which denotes the data corruption. Byzantine Fault tolerant algorithm or data error localization algorithm is used to identify in which server the data gets corrupted or which server is not behaving properly. After detecting the error, reed Solomon algorithm is used to recover the corrupted data. In this paper, precompute the token for large number of data will improve the performance of error detection in storage services.

Journal ArticleDOI
TL;DR: A novel fault tolerant model of AES is presented which is based on the Hamming error correction code, which reduces the data corruption and increases the performance, and the data Corruption due to Single Event Upset can be avoided and the performance is increased.
Abstract: The Advanced Encryption Standard (AES) has been lately accepted as the symmetric cryptography standard for confidential data transmission. The AES cipher is specified as a number of repetitions of transformation rounds that convert the input plain-text into the final output of cipher-text. Each round consists of several processing steps, including one that depends on the encryption key. A set of reverse rounds are applied to transform cipher-text back into the original plain-text using the same encryption key. The proposed schemes are independent of the way the S-box and the inverse S-box are constructed. Therefore, they can be used for both the S-boxes and the inverse S-boxes using lookup tables and those utilizing logic gates based on composite fields. Furthermore, for each composite field constructions, there exists eight possible isomorphic mappings. Therefore, after the exploitation of a new common sub expression elimination algorithm, the isomorphic mapping that results in the minimal implementation area cost is chosen. High throughput hardware implementations of the proposed CFA AES S-boxes are reported. In order to avoid data corruption due to SEU's a novel fault tolerant model of AES is presented which is based on the Hamming error correction code. This reduces the data corruption and increases the performance.Thus the data corruption due to Single Event Upset can be avoided and the performance is increased.

01 Jan 2013
TL;DR: A novel fault tolerant model of AES is presented which is based on the Hamming error correction code and reduces the data corruption and increase the performance, thus the data Corruption due to Single Event Upset can be avoided and the performance was increased.
Abstract: The Advanced Encryption Standard (AES) has been lately accepted as the symmetric cryptography standard for confidential data transmission. The AES cipher is specified as a number of repetitions of transformation rounds that convert the input plain-text into the final output of cipher-text. All rounds consists of several processing steps including one that depends on the encryption key. A set of reverse rounds applied to transform cipher-text back into the original plain- text using the same encryption key. The proposed schemes are independent of the way the S-box and the inverse S-box are constructed. Therefore, they can be used for both S-boxes and the inverse S-boxes using lookup tables and those utilizing logic gates based on composite fields. Furthermore, for each composite field constructions, there exists eight possible isomorphic mapping. Therefore, after the exploitation of a new common subexpression elimination algorithm, the isomorphic mapping that result in the minimal implementation area cost is chosen. A high throughput hardware implementations of our proposed CFA AES S-boxes are reported. In order to avoid data corruption due to SEU's a novel fault tolerant model of AES is presented which is based on the Hamming error correction code. This reduces the data corruption and increase the performance.Thus the data corruption due to Single Event Upset can be avoided and the performance was increased.

Patent
09 Oct 2013
TL;DR: In this article, a back-up and access method of data at an RFID electronic tag, wherein the method enables internal data of a RFID card to be accessed, verified, and checked effectively so as to ensure accuracy of information and have backup and effective data recovery functions.
Abstract: The invention relates to a back-up and access method of data at an RFID electronic tag, wherein the method enables internal data of an RFID card to be accessed, verified, and checked effectively so as to ensure accuracy of information and have back-up and effective data recovery functions. Information misoperation and data corruption of the RFID electronic tag can be prevented; and a POS machine wireless radio frequency sensing device senses the RFID electronic tag and information like data operating situation of the RFID electronic tag is recorded in a central management system to know the information state of the RFID electronic tag, thereby realizing convenient usage by the user. Meanwhile, data misuse or data corruption can be prevented; and consumption dining information security and reliability as well as full-automatic management are realized. When the method is used, effective access and back up of key data in an RFID electronic tag can be realized, thereby realizing a dual protection effect.

01 Jan 2013
TL;DR: In this article, the authors propose MemPick, a tool that detects and classifies high-level data structures used in stripped C/C++ binaries, such as lists and trees.
Abstract: Most current techniques for data structure reverse engineering are limited to low-level programing constructs, such as individual variables or structs. In practice, pointer networks connect some of these constructs, to form higher level entities like lists and trees. The lack of information about the pointer network limits our ability to efficiently perform forensics and reverse engineering. To fill this gap, we propose MemPick, a tool that detects and classifies high-level data structures used in stripped C/C++ binaries. By analyzing the evolution of the heap during program execution, it identifies and classifies the most commonly used data structures, such as singly- or doubly-linked lists, many types of trees (e.g., AVL, red-black trees, B-trees), and graphs. We evaluated MemPick on a wide variety of popular libraries and real world applications with great success. I. INTRODUCTION Data structures represent the backbone of typical modern software. They allow developers to focus on high-level ab- stractions when dealing with collections of data entries. As a side-effect, application logic becomes coupled with data structure semantics. During the process of reverse engineering, significant effort is spent to understand such semantics, instead of focusing on the more relevant application logic. Accurate data structure detection is also useful for binary rewriting. Some researchers propose aggressive binary level optimization techniques, by replacing data structure implementations with more efficient variants in legacy binaries (1). Alternatively one could instrument binaries with high-level safety properties checks to detect possible data corruption, analogous to the work done on memory corruption by Slowinska et al. (2). We propose MemPick: a tool specifically designed to tackle the problem of detecting and classifying heap data structures in stripped C/C++ binaries. It makes no assumptions about the structure of the binary and requires no additional information sources like source-code or debug symbols. MemPick operates accurately even in the presence of inlined functions and is resilient to the wide range of optimization options available in commercial binaries. The classification scheme in MemPick is based on shape analysis. Conceptually, if a collection of heap objects behaves according to the characteristics of a given data structure throughout the program execution, then it probably is that given data structure. Thus, MemPick can focus on the evolution of the heap structure instead of interpreting the individual instructions. This makes it resilient to variations in

01 Jan 2013
TL;DR: This paper presents an error detection and correction method for Euclidean geometry Low density parity check (EG-LDPC) codes with majority logic decoding which significantly reduces memory access time and also it takes three iterations instead of N iterations when there is no error in the data read.
Abstract: Memory is a basic resource in every digital systems but nowadays, single event upsets (SEU) altering these memories by changing its states which is caused by ions or electro-magnetic radiations. Error-correcting code memory (ECC memory) is a type of computer data storage that can detect and correct the more common kinds of internal data corruption. This paper presents an error detection and correction method for Euclidean geometry Low density parity check (EG-LDPC) codes with majority logic decoding. Here the application is mainly focused on memories, since MLDD is used here due to its capability of correcting large errors. Even though they require a large decoding time that will more affecting on memory performance. This can be overcome by the proposed technique which significantly reduces memory access time and also it take three iterations instead of N iterations when there is no error in the data read and it uses majority logic decoder itself to detect failures which minimizes the area and power consumption.

Proceedings ArticleDOI
24 Jun 2013
TL;DR: An ongoing research that aims at mitigating the impact of data corruption due to device driver defects on the availability and the integrity of data is described.
Abstract: Critical systems widely depend on operating systems to perform their mission. Device drivers are a critical and defect-prone part of operating systems. Software defects in device drivers often cause corruption of data that may lead to data losses, that are a significant source of costs for large enterprise systems. This paper describes an ongoing research that aims at mitigating the impact of data corruption due to device driver defects on the availability and the integrity of data. We discuss a methodology for run-time detection and the tolerance of protocol violations in device drivers and then we present a preliminary activity that we are currently performing.

Journal ArticleDOI
TL;DR: Two examples for platform security and a comparison between three techniques for how can users secure the data and computations at cloud environment, to keep the confidentiality and integrity to encourage enterprises and users to use the cloud storage systems on secure infrastructures and platforms that maintain their customers/own data privacy.
Abstract: The demand to have secure environments for systems’ information on cloud systems is arising since this should be more secure and far away from our systems damage disasters and data corruption. The dream became real with the cloud computing. Cloud computing is being more famous and used especially in the enterprises where the cost matters. These enterprises can reduce IT costs by using cloud computing services. Most of the enterprises that don’t use cloud computing have a fear of the platform security and the information security. The users always seek for verifying the confidentiality and integrity of their data and computations before using cloud computing. Many researches have been established in these two fields for the infrastructure, platform security, data confidentiality and integrity. This paper, presents two examples for platform security and a comparison between three techniques for how can users secure the data and computations, at cloud environment, to keep the confidentiality and integrity. The ultimate goal of providing such a comparison is to encourage enterprises and users to use the cloud storage systems on secure infrastructures and platforms that maintain their customers/own data privacy.