scispace - formally typeset
Search or ask a question

Showing papers on "Data Corruption published in 2020"


Journal ArticleDOI
TL;DR: A collaborative auditing blockchain framework for cloud data storage that all consensus nodes substitute for the single third party auditor to execute auditing delegations and record them permanently, thereby preventing entities from deceiving each other.
Abstract: Cloud storage system provides data owners with remote storage service, which allows them to outsource data without local storage burden. Nevertheless, the cloud storage service is not fully trustworthy since it may not be honest and remote data would be corrupted. One way to ensure trustworthy preservation of cloud data is the remote data auditing method, through which data owners can check storage reliability of cloud system on demand and avoid potential data corruption in time. However, private auditing methods fail to promise the mutual trust in auditing results. Thus, public auditing methods are introduced, in which traditionally a third party auditor is delegated to interact with cloud service providers for auditing tasks. Although the third party auditor serves as a medium to exchange trust, a centralized third party is hard to stay neutral, which exposes the remote data auditing to some threats such as collusion attacks. To address the trust problem between data owners and cloud service providers, we propose a collaborative auditing blockchain framework for cloud data storage. In this framework, all consensus nodes substitute for the single third party auditor to execute auditing delegations and record them permanently, thereby preventing entities from deceiving each other. Security analysis shows that the proposed framework has advantage of preserving remote data integrity from various attacks. Performance analysis demonstrates that the framework is more functional and resource-friendly than existing schemes.

35 citations


Proceedings ArticleDOI
30 May 2020
TL;DR: This work proposes to offload the update and verification of system-level redundancy to TVARAK, a new hardware controller co-located with the last-level cache that enables efficient protection of data from bugs in memory controller and NVM DIMM firmware.
Abstract: Production storage systems complement device-level ECC (which covers media errors) with system-checksums and cross-device parity. This system-level redundancy enables systems to detect and recover from data corruption due to device firmware bugs (e.g., reading data from the wrong physical location). Direct access to NVM penalizes software-only implementations of system-level redundancy, forcing a choice between lack of data protection or significant performance penalties. We propose to offload the update and verification of system-level redundancy to Tvarak, a new hardware controller co-located with the last-level cache. Tvarak enables efficient protection of data from such bugs in memory controller and NVM DIMM firmware. Simulation-based evaluation with seven data-intensive applications shows that Tvarak is efficient. For example, Tvarak reduces Redis set-only performance by only 3%, compared to 50% reduction for a state-of-the-art software-only approach.

12 citations



Proceedings ArticleDOI
05 Apr 2020
TL;DR: A new on-line error correcting scheme is proposed based on partial and selective checksums which can correct errors in the field and can achieve low decoding latency and comparatively smaller memory and area overhead in order to guarantee protection against errors in a single column.
Abstract: Resistive RAM technology with it’s in memory computation and matrix vector multiplication capabilities has paved the way for efficient hardware implementations of neural networks. The ability to store the training weights and perform a direct matrix vector multiplication with the applied inputs thus producing the outputs directly reduces a lot of memory transfer overhead. But such schemes are prone to various soft errors and hard errors due to immature fabrication processes creating marginal cells, read disturbance errors, etc. Soft errors are of concern in this case since they can potentially cause mi-classification of objects leading to catastrophic consequences for safety critical applications. Since the location of soft errors are not known previously, they can potentially manifest in the field leading to data corruption. In this paper, a new on-line error correcting scheme is proposed based on partial and selective checksums which can correct errors in the field. The proposed scheme can correct any number of errors in a single column of a given RRAM matrix. Two different checksum computation schemes are proposed, a majority voting-based scheme and a Hamming code-based scheme. The memory overhead and decoding area, latency and dynamic power consumption for both the proposed schemes are presented. It is seen that the proposed solutions can achieve low decoding latency and comparatively smaller memory and area overhead in order to guarantee protection against errors in a single column. Lastly, a scheme to extend the proposed scheme to multiple column errors is also discussed.

10 citations


Proceedings ArticleDOI
27 Jun 2020
TL;DR: In this article, the authors present a general-purpose algorithm called ddmax that addresses these problems automatically, which maximizes the subset of the input that can still be processed by the program, thus recovering and repairing as much data as possible.
Abstract: When a program fails to process an input, it need not be the program code that is at fault. It can also be that the input data is faulty, for instance as result of data corruption. To get the data processed, one then has to debug the input data---that is, (1) identify which parts of the input data prevent processing, and (2) recover as much of the (valuable) input data as possible. In this paper, we present a general-purpose algorithm called ddmax that addresses these problems automatically. Through experiments, ddmax maximizes the subset of the input that can still be processed by the program, thus recovering and repairing as much data as possible; the difference between the original failing input and the "maximized" passing input includes all input fragments that could not be processed. To the best of our knowledge, ddmax is the first approach that fixes faults in the input data without requiring program analysis. In our evaluation, ddmax repaired about 69% of input files and recovered about 78% of data within one minute per input.

9 citations


Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this paper, the authors proposed the implementation of the blockchain in federated learning for local parameters evaluation and global parameter aggregation, thus alleviating the influence of end-point adversarial training data.
Abstract: With the approach of 5G Society, more and more devices have been connected to the Internet, where information is stored, analyzed, and shared. Federated learning allows participants to train a machine learning model through sharing the parameters of it based on local training, instead of raw private data at local. In this research, we propose the implementation of the blockchain in federated learning for local parameters evaluation and global parameter aggregation, thus alleviating the influence of end-point adversarial training data. Besides, all updates of local parameters are encrypted and stored in a block of the blockchain after the consensus by the committee. We evaluate the performance of the scheme when adopting various types of corruption to the adversary’s dataset, including noise with various degrees and circle occlusion with various diameters. At last, it shows robust and resilient performance compared with the traditional federated learning, achieving a validation accuracy rate of 0.957 when adding noise with a degree of 1.0, and one of 0.944 when adopting circle occlusion with a diameter of 28 pixels for the classification.

9 citations


Journal ArticleDOI
TL;DR: Case study clearly demonstrates that the proper data quality management process and information extraction methods are essential to carry out an intelligent digitalization in oil and gas industry.
Abstract: Data analytics is a process of data acquiring, transforming, interpreting, modelling, displaying and storing data with an aim of extracting useful information, so that decision-making, actions executing, events detecting and incidents managing can be handled in an efficient and certain manner. However, data analytics also meets some challenges, for instance, data corruption due to noises, time delays, missing and external disturbances, etc. This paper focuses on data quality improvement to cleanse, improve and interpret the post-well or real-time data to preserve and enhance data features, like accuracy, consistency, reliability and validity. In this study, laboratory data and field data are used to illustrate data issues and show data quality improvements with using different data processing methods. Case study clearly demonstrates that the proper data quality management process and information extraction methods are essential to carry out an intelligent digitalization in oil and gas industry.

8 citations


Journal ArticleDOI
TL;DR: The asymmetry in the errors in Embedded DRAMs (eDRAMs) is exploited for error-tolerant designs without using any ECC or parity, which are redundancy-free in terms of memory cells.
Abstract: For some applications, errors have a different impact on data and memory systems depending on whether they change a zero to a one or the other way around; for an unsigned integer, a one to zero (or zero to one) error reduces (or increases) the value. For some memories, errors are also asymmetric; for example, in a DRAM, retention failures discharge the storage cell. The tolerance of such asymmetric errors would result in a robust and efficient system design. Error Control Codes (ECCs) are one common technique for memory protection against these errors by introducing some redundancy in memory cells. In this paper, the asymmetry in the errors in Embedded DRAMs (eDRAMs) is exploited for error-tolerant designs without using any ECC or parity, which are redundancy-free in terms of memory cells. A model for the impact of retention errors and refresh time of eDRAMs on the False Positive rate or False Negative rate of some eDRAM applications is proposed and analyzed. Bloom Filters (BFs) and read-only or write-through caches implemented in eDRAMs are considered as the first case studies for this model. For BFs, their tolerance to some zero to one errors (but not one to zero errors) is combined with the asymmetry of retention errors in eDRAMs to show that no ECC or parity is needed to protect the filter; moreover, the eDRAM refresh time can significantly be increased, thus reducing its power consumption. For caches, this paper shows that asymmetry in errors can be exploited also by using a redundancy-free error-tolerant scheme, which only introduces false negatives, but no false positives, therefore causing no data corruption. The proposed redundancy-free implementations have been compared with existing schemes for BFs and caches to show the benefits in terms of different figures of merit such as memory size, area, decoder/encoder complexity and delay. Finally, in the last case study, we show that the asymmetry of retention errors can be used to develop additional error correction capabilities in Modular Redundancy Schemes.

6 citations


Posted Content
08 Jun 2020
TL;DR: Picket, a first-of-its-kind system that enables data diagnostics for machine learning pipelines over tabular data, is presented and shows that Picket offers consistently accurate diagnostics during both training and deployment of various models ranging from SVMs to neural networks, beating competing methods of data quality validation in machineLearning pipelines.
Abstract: Data corruption is an impediment to modern machine learning deployments. Corrupted data can severely bias the learned model and can also lead to invalid inference. We present, Picket, a first-of-its-kind system that enables data diagnostics for machine learning pipelines over tabular data. Picket can safeguard against data corruptions that lead to degradation either during training or deployment. For the training stage, Picket identifies erroneous training examples that can result in a biased model, while for the deployment stage, Picket flags corrupted query points to a trained machine learning model that due to noise will result to incorrect predictions. Picket is built around a novel self-supervised deep learning model for mixed-type tabular data. Learning this model is fully unsupervised to minimize the burden of deployment, and Picket is designed as a plugin that can increase the robustness of any machine learning pipeline. We evaluate Picket on a diverse array of real-world data considering different corruption models that include systematic and adversarial noise. We show that Picket offers consistently accurate diagnostics during both training and deployment of various models ranging from SVMs to neural networks, beating competing methods of data quality validation in machine learning pipelines.

5 citations


Journal ArticleDOI
01 Apr 2020
TL;DR: A multi-valued decision diagram–based approach is developed to quantitatively evaluate system reliability for both models considering the time-dependence property of sequential events to study the reliability of distributed storage systems considering both data loss and data theft.
Abstract: With the advancement of cloud computing and internet of things, data are usually stored on distributed computers and these data may risk being lost or stolen. In this article, we consider a common ...

5 citations


Journal ArticleDOI
Shuai Yin1
01 Jun 2020
TL;DR: In order to ensure the security and integrity of data, technicians need to establish a data authentication results detection algorithm that ensures the reliability and utilization of data by improving the limited verification results.
Abstract: As one of the most widely used big data computing technologies, data cloud storage technology brings great convenience to users. However, while sharing data, it also causes data corruption and missing data integrity problems. Today's verification of the integrity of remote data is publicly performed by trusted third parties. This makes the verifier have the potential threat of providing false information. The resulting validation data is untrustworthy. In order to ensure the security and integrity of data, technicians need to establish a data authentication results detection algorithm. Moreover, this algorithm ensures the reliability and utilization of data by improving the limited verification results.

Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this paper, an autoencoder based approach to anomaly detection in smart grid systems is proposed, which can be used to detect malicious attacks or physical malfunctions in smart homes.
Abstract: We propose an autoencoder based approach to anomaly detection in smart grid systems. Data collecting sensors within smart home systems are susceptible to many data corruption issues, such as malicious attacks or physical malfunctions. By applying machine learning to a smart home or grid, sensor anomalies can be detected automatically for secure data collection and sensor-based system functionality. In addition, we tested the effectiveness of this approach on real smart home sensor data collected for multiple years. An early detection of such data corruption issues is essential to the security and functionality of the various sensors and devices within a smart home.

Proceedings ArticleDOI
01 Oct 2020
TL;DR: In this paper, the authors identify potential security threats around the blockchain and propose potential countermeasures to protect blockchain users against three threats targeting data corruption, data protection and input falsification.
Abstract: Blockchain and Industry 4.0 have the potential to revolutionize the way entities work together in the industrial environment. Former paper-based and manual tasks, such as maintenance, will be replaced by smart contracts in combination with automatic data collection. We consider different data sources and possibilities integrating them into a blockchain ecosystem in consideration of integration and security aspects. We identify potential security threats around the blockchain and propose potential countermeasures. Finally, we provide three solutions to protect blockchain users against three threats targeting data corruption, data protection and input falsification.

Book ChapterDOI
Shuai Huang1, Jing Xiao1, Hong-liang Mao, Mu-ran Su, Hai-bo Ying, Shuang Tan 
06 Aug 2020
TL;DR: Wang et al. as mentioned in this paper proposed a data integrity verification scheme based on blockchain and blind homomorphic tags to tackle the over-reliance on the third party auditor, where the smart contract is used to control the access of different auditors and this allows users to change auditors freely.
Abstract: With the advent of big data, users’ data is usually outsourced to the cloud. However, users will lose the absolute control over the data and its integrity is hard to be guaranteed. Currently, the most effective way to detect data corruption in the cloud is through data integrity verification that usually relies on third party. However, the third party is not always credible. This paper proposes a data integrity verification scheme based on blockchain and blind homomorphic tags to tackle the over-reliance on the third party auditor. Firstly, our approach is explored to weaken the centralization of the third party through the blockchain technology. Secondly, the smart contract is used to control the access of different auditors, and this allows users to change auditors freely. Lastly, blind homomorphic tags are proposed to avoid recomputing of tags when users change auditors. Based on the experiment results, our proposed scheme is more credible and has a higher recognition rate under the same computational overhead when compared with other mechanisms.

Journal ArticleDOI
TL;DR: The paper expands on earlier contributions, arguing for the need of a new notion of security based on the assumption that it is computationally difficult for an adversary to corrupt some ciphertext so that the resulting plaintext demonstrates specific patterns.

Patent
12 Mar 2020
TL;DR: In this article, a method for analyzing data corruption is presented, in which a data set resides on tracks of a volume and the method determines, from control information associated with the volume, on which tracks of the volume the data set reside.
Abstract: A method for analyzing data corruption is disclosed. In one embodiment, such a method includes identifying a data set to analyze for data corruption. This data set resides on tracks of a volume. The method further determines, from control information associated with the volume, on which tracks of the volume the data set resides. The method reads content of the data set without opening the data set by performing full-track reads of the tracks. The method further determines an expected format of the content by analyzing the control information. An actual format of the content is compared to the expected format to identify areas of the data set that may be corrupt. A corresponding system and computer program product are also disclosed.

Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this paper, an association rule mining based algorithm was used to find the probability of the corruption in each of the bits and the best recovery rate of 66% was found in the most complex scenario, i.e., random bit corruption.
Abstract: Embedded systems are finding their way into almost every aspects of our daily life from mp3 players and console games to the mobile phones. Different Artificial Intelligence (AI) based applications are commonly utilized in embedded systems from which computer vision based approaches are included. The demand for higher accuracy in computer vision applications is associated with the increased complexity of convolutional neural networks and the storage requirement for saving pre-trained networks. Different factors can lead to the data corruption in the storage units of the embedded systems, which can result in drastic failures due to the propagation of the errors. Hence, the development of software-based algorithms for the detection and recovery of data corruption is crucial for improvement and failure-prevention of embedded systems. This paper proposes a new algorithm for the recovery of the data in the case of single event upset (SEU) error. The association rule mining based algorithm will be used to find the probability of the corruption in each of the bits. The recovery algorithm was tested on four different pre-trained ResNet (ResNet32 and ResNet110 at two different accuracy levels each) and the best recovery rate of 66% was found in the most complex scenario, i.e., random bit corruption. However, for the special cases of SEU errors, e.g. error in the frequently repeated bits, the recovery rate was found to be perfect with a value of 100%.

Patent
12 Mar 2020
TL;DR: In this article, a data integrity check is performed by an entity other than the integrity check entity and the data protection entity, accessing an information set created by the data protect entity concerning the backup dataset, performing an integrity check process that includes analyzing the backup data and information contained in the information set and, based on the analysis, identifying data integrity problem that resulted from the data corruption event involving the backup datasets.
Abstract: One example method, which may be performed by a data integrity check entity, includes receiving a request from a data protection entity to perform a data integrity check regarding a backup dataset, and the backup dataset includes a backup of an entity other than the data integrity check entity and the data protection entity, accessing an information set created by the data protection entity concerning the backup dataset, performing a data integrity check process that includes analyzing the backup dataset and information contained in the information set and, based on the analysis, identifying a data integrity problem that resulted from the data corruption event involving the backup dataset, and as between the data integrity check entity and the data protection entity, only the data integrity check entity checks the integrity of data in the backup dataset, and reporting results of the analysis of the backup dataset and the information set.

Book ChapterDOI
06 Aug 2020
TL;DR: Wang et al. as discussed by the authors proposed a secure auditing scheme based on the blockchain and Intel SGX technology, termed SDABS, which follows the properties of storage correctness, data-preserving, accountability, and anti-collusion.
Abstract: With the continuous growth of data resources, outsourcing data storage to cloud service providers is becoming the norm. Unfortunately, once data are stored on the cloud platform, they will be out of data owners’ control. Thus, it is critical to guarantee the integrity of the remote data. To solve this problem, researchers have proposed many data auditing schemes, which often employ a trusted role named Third Party Auditor (TPA) to verify the integrity. However, the TPA may not be reliable as expected. For example, it may collude with cloud service providers to hide the fact of data corruption for benefits. Blockchain has the characteristics of decentralization, non-tampering, and traceability, which provides a solution to trace the malicious behaviors of the TPA. Moreover, Intel SGX, as the popular trusted computing technology, can be used to protect the correctness of the auditing operations with a slight performance cost, which excellently serves as the of the blockchain-based solution. In this paper, we propose a secure auditing scheme based on the blockchain and Intel SGX technology, termed SDABS. The scheme follows the properties of storage correctness, data-preserving, accountability, and anti-collusion. The experiment results show that our scheme is efficient.

Proceedings ArticleDOI
01 Nov 2020
TL;DR: This paper proposes a novel approach that addresses all the above challenges by developing a distributed robust regression algorithm that optimizes regression coefficients of each target in parallel with a heuristically estimated corruption ratio and consolidates the uncorrupted set in two strategies: global consensus and majority voting.
Abstract: Multi-target regression has recently drawn increasing attention in the machine learning community due to its capability of simultaneously predicting multiple continuous target variables based on a given set of input features. Jointly handling the inter-target correlations and input-output relationships is very challenging. That task becomes even more intricate in the presence of correlated data corruption. We observe that traditional robust methods can hardly deal with several emerging challenges, including 1) presence of correlated corruption among targets in the datasets, 2) difficulty in estimating the data corruption ratio, and 3) scalability to massive datasets. This paper proposes a novel approach that addresses all the above challenges by developing a distributed robust regression algorithm. Specifically, the algorithm optimizes regression coefficients of each target in parallel with a heuristically estimated corruption ratio and then consolidates the uncorrupted set in two strategies: global consensus and majority voting. Also, we prove that our algorithm benefits from strong guarantees in terms of convergence rates and coefficient recovery, which can be applied as a generic framework for robust regression problem with correlated corruption property. Extensive experiments on synthetic and real-world datasets demonstrate that our algorithm is superior to existing methods in both effectiveness and efficiency.

Patent
04 Jun 2020
TL;DR: In this paper, a system and method for high-speed transfer of small data sets, that provides near-instantaneous bit-level lossless compression, that is ideal for communications environments that cannot tolerate even small amounts of data corruption, have very low latency tolerance, where data has a low entropy rate, and where every bit costs the user bandwidth, power, or time so that deflation is worthwhile.
Abstract: A system and method for high-speed transfer of small data sets, that provides near-instantaneous bit- level lossless compression, that is ideal for communications environments that cannot tolerate even small amounts of data corruption, have very low latency tolerance, where data has a low entropy rate, and where every bit costs the user bandwidth, power, or time so that deflation is worthwhile. Where some loss of data can be tolerated, the system and method can be configured for use as lossy compression.

Proceedings ArticleDOI
06 Nov 2020
TL;DR: The in between nodes are known as intermediate nodes, that act as node which act as nodes which involved in data forwarding process, and are included in problems like data loss, data mismatch, data corruption, data deletion etc.
Abstract: The in between nodes are known as intermediate nodes, that act as nodes which involved in data forwarding process. Network designed based on the node inter connected with each other, the node get connected from one node to another via full duplex mode form the connected network. Detection of false data plays a major role in network security while forwarding the data through number of nodes. The data forwarding through the intermediates node included in problems like data loss, data mismatch, data corruption, data deletion etc. To overcome these kinds of problem number of techniques generated in term of detecting it, but overcoming or preventing plays a different part from them.

Patent
David C. Reed1, Gregory E. McBride1
02 Jan 2020
TL;DR: In this article, a method for analyzing data corruption is presented, which identifies a specific location within the data set containing corrupted data and analyzes the specific location to determine if the corrupted data is contained therein.
Abstract: A method for analyzing data corruption is disclosed. In one embodiment, such a method includes identifying a data set containing corrupted data. The method identifies a specific location within the data set containing the corrupted data and analyzes the specific location to determine if the corrupted data is contained therein. The method repeatedly performs the following until the corrupted data is no longer found within the specific location: revert to a previous version of the specific location by removing an incremental update to the specific location, and analyze the previous version of the specific location to determine if it contains the corrupted data. When a previous version of the specific location is found that no longer contains the corrupted data, the method determines a timestamp associated with the previous version and provides the timestamp to a user. A corresponding system and computer program product are also disclosed.

Patent
30 Jul 2020
TL;DR: In this article, the unusual access patterns may be determined based on a number of data reads per unit time and/or data writes per unit timespan, and a counter of a flag that is set each time a data portion is accessed.
Abstract: Detecting data corruption in a storage device includes periodically examining portions of the data for unusual access patterns and/or unusual data manipulation and providing an indication in response to detecting unusual access patterns and/or unusual data manipulation The unusual access patterns may be determined based on a number of data reads per unit time and/or a number of data writes per unit time The number of data reads per unit time and the number of data writes per unit time may be determined using a counter of a flag that is set each time a data portion is accessed Thresholds that are based on prior data accesses may be used to determine unusual access patterns A user may set different thresholds for different portions of the data A cyclic threshold may be used for cyclic access data and a level threshold may be used for non-cyclic data

Patent
09 Jan 2020
TL;DR: In this article, the first electronic navigational plan data comprising a plurality of waypoints and storing it in a data store are disclosed for detecting data corruption or tampering in vehicle data systems.
Abstract: Systems and methods are disclosed for detecting data corruption in vehicle data systems. Systems and methods of detecting data corruption or tampering in vehicle data systems may include steps for receiving first electronic navigational plan data, the first electronic navigational plan data comprising a plurality of waypoints, and storing the first electronic navigational plan data in a data store. Systems and methods may further comprise receiving second electronic navigational plan data from a vehicle management system, the second electronic navigational plan data comprising a second plurality of waypoints, and, upon determining a discrepancy between the first plurality of waypoints and the second plurality of waypoints, generating an alert indicating possible data corruption in the second electronic navigational plan.