scispace - formally typeset
Search or ask a question

Showing papers on "Data Corruption published in 2021"


Journal ArticleDOI
TL;DR: This work introduces the blockchain to record the interactions among users, service providers, and organizers in data auditing process as evidence, and employs the smart contract to detect service dispute, so as to enforce the untrusted organizer to honestly identify malicious service providers.
Abstract: Network storage services have benefited countless users worldwide due to the notable features of convenience, economy and high availability. Since a single service provider is not always reliable enough, more complex multi-cloud storage systems are developed for mitigating the data corruption risk. While a data auditing scheme is still needed in multi-cloud storage to help users confirm the integrity of their outsourced data. Unfortunately, most of the corresponding schemes rely on trusted institutions such as the centralized third-party auditor (TPA) and the cloud service organizer, and it is difficult to identify malicious service providers after service disputes. Therefore, we present a blockchain-based multi-cloud storage data auditing scheme to protect data integrity and accurately arbitrate service disputes. We not only introduce the blockchain to record the interactions among users, service providers, and organizers in data auditing process as evidence, but also employ the smart contract to detect service dispute, so as to enforce the untrusted organizer to honestly identify malicious service providers. We also use the blockchain network and homomorphic verifiable tags to achieve the low-cost batch verification without TPA. Theoretical analyses and experiments reveal that the scheme is effective in multi-cloud environments and the cost is acceptable.

58 citations


Journal ArticleDOI
TL;DR: Blockchain technology (BCT) could be a supplementary technology that supports the existing information exchange systems and improves the design liability control for contributing stakeholders and the auditability of the exchange records.

41 citations


Journal ArticleDOI
01 Jun 2021
TL;DR: Wang et al. as discussed by the authors proposed a robust and auditable distributed data storage (Radds) scheme to support a secure and dependable edge storage in edge computing, which allows to check the integrity of data in distributed edge storage servers and guarantees data repairability in case of data corruption.
Abstract: With the widespread use of Internet-of-Things (IoT) devices, edge computing is becoming a popular technology for processing and storing data distributed at the edge of the networks. However, the new paradigm also faces a major security challenge: how to ensure the reliability and integrity of data in distributed edge storage servers? In this paper, we propose a robust and auditable distributed data storage (Radds) scheme to support a secure and dependable edge storage in edge computing. Firstly, based on homomorphic verifiable authenticator and regenerating code technique, the proposed scheme allows to check the integrity of data in distributed edge storage servers and guarantees data repairability in case of data corruption. Moreover, the server with corrupted data also can be deduced from integrity proofs, and a proxy is introduced for data reparation to release edge nodes from online burden and computation costs. Secondly, the proposed scheme can protect the privacy of original data from the third party auditor by blinding the encoding coefficients with a keyed pseudorandom function. Thirdly, the proposed scheme supports flexible scalability, i.e., dynamic joining and exiting of the edge nodes. Moreover, even if some data are not collected temporarily, they still can be supplemented to the encoded data file by an efficient way and the integrity checking and data reparation can be performed normally. Finally, security analysis and performance evaluation demonstrate that the proposed scheme is secure and highly efficient.

6 citations


Journal ArticleDOI
TL;DR: In this article, a data aware module is implemented in Hadoop which provides more clustering process and reduces the computing performances of server using balanced and proxy encryption technique using Cloud me tool and gives optimized query time and resource usage.

4 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used the Benford's law to detect inconsistencies in data on daily new cases of COVID-19 reported by 80 countries and found that data from 26 countries display severe non-conformity to the BLS (p < 0.01), what may suggest data corruption or manipulation.
Abstract: Reporting of daily new cases and deaths on COVID-19 is one of the main tools to understand and menage the pandemic. However, governments and health authorities worldwide present divergent procedures while registering and reporting their data. Most of the bias in those procedures are influenced by economic and political pressures and may lead to intentional or unintentional data corruption, what can mask crucial information. Benford's law is a statistical phenomenon, extensively used to detect data corruption in large data sets. Here, we used the Benford's law to screen and detect inconsistencies in data on daily new cases of COVID-19 reported by 80 countries. Data from 26 countries display severe nonconformity to the Benford's law (p< 0.01), what may suggest data corruption or manipulation. © 2021 - IOS Press. All rights reserved.

4 citations


Journal ArticleDOI
Xiaoyue Zhang1, Jingfei He1, Yunpei Li1, Yue Chi1, Yatong Zhou1 
TL;DR: Wang et al. as mentioned in this paper proposed a tensor singular value decomposition (SVD) based data recovery method for wireless sensor networks, where data collected by the spatial distributed sensor nodes in each time slot is arranged in matrix form instead of vector to further exploit the spatial correlation of the data.
Abstract: Due to the hardware and network conditions, data collected in Wireless Sensor Networks usually suffer from loss and corruption. Most existing research works mainly consider the reconstruction of missing data without data corruption. However, the inevitable data corruption poses a great challenge to guarantee the recovery accuracy. To address this problem, this letter proposes a data recovery method based on tensor singular value decomposition. Data collected by the spatial distributed sensor nodes in each time slot is arranged in matrix form instead of vector to further exploit the spatial correlation of the data. Therefore, data collected in consecutive time slots can form a three-way tensor. To avoid the influence of corruption on recovery accuracy, a Tensor Robust Principal Component Analysis model is developed to decompose the raw data tensor into a low-rank normal data tensor and a sparse error tensor. The recovery accuracy is further improved by incorporating total variation constraint. Computer experiments corroborate that the proposed method significantly outperforms the existing method in the recovery accuracy.

3 citations


Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the authors considered different users' data protection policies with the aim to make the data resilient to the co-residence attacks, including data partition with and without replication of the parts, and attack detection through the early warning mechanism.
Abstract: The virtualization technology, particularly virtual machines (VMs) used in cloud computing systems have raised unique security and reliability risks for cloud users. This chapter focuses on the resilience to one of such risks, co-residence attacks where a user’s information in one VM can be accessed/stolen or corrupted through side channels by a malicious attacker’s VM co-residing on the same physical server. Both users’ and attackers’ VMs are distributed among cloud servers at random. We consider different users’ data protection policies with the aim to make the data resilient to the co-residence attacks, including data partition with and without replication of the parts, and attack detection through the early warning mechanism. Probabilistic models are suggested to derive the overall probabilities of an attacker’s success in data theft and data corruption. Based on the suggested probabilistic evaluation models, optimization problems of obtaining the data partition/replication policy to balance data security, data reliability, and a user’s overheads are formulated and solved, leading to the optimal data protection policy to achieve data resilience. The possible user’s uncertainty about the number of attacker’s VMs is taken into account. Numerical examples demonstrating the influence of different constraints on the optimal policy are presented.

3 citations


Proceedings ArticleDOI
22 Mar 2021
TL;DR: In this article, the authors propose an efficient and robust data integrity verification scheme for large-scale data transfer between computing systems with high-performance storage devices, where the order of I/O operations is controlled to ensure the robustness of the integrity verification.
Abstract: Most of the data generated on high-performance computing systems are transferred to storage in remote systems for various purposes such as backup. To detect data corruption caused by network or storage failures during data transfer, the receiver system verifies data integrity by comparing the checksum of the data. However, the internal operation of the storage device is not sufficiently investigated in the existing end-to-end integrity verification techniques. In this paper, we propose an efficient and robust data integrity verification scheme for large-scale data transfer between computing systems with high-performance storage devices. To ensure the robustness of the integrity verification, we control the order of I/O operations. In addition, we parallelize checksum computing and overlap it with I/O operations to make the integrity verification efficient.

2 citations


Book ChapterDOI
08 Jul 2021
TL;DR: In this article, the authors proposed a novel method for using a PoW-based blockchain to ensure data integrity in cloud database management systems, which exploits a Distributed Hash Table and lightweight software agents, which are monitoring changes done to cloud database storage nodes.
Abstract: This paper proposes a novel method for using a PoW-based Blockchain to ensure data integrity in cloud database management systems. The use of cloud platforms for storing data or even hosting databases is incredibly huge, and in many cases, there is no convenient way for a client to check the integrity of the data stored in the cloud database. To solve this, we propose a technique based on an interaction between the cloud platform and a PoW-based Blockchain. This interaction exploits a Distributed Hash Table and lightweight software agents, which are monitoring changes done to cloud database storage nodes. Data update operations are published by the agents as Blockchain log/audit transactions that propagate deep into the Blockchain network until they become immutably and cryptographically protected by it. The proposed method enables the Cloud Provider to manage metadata so that it will be able to easily detect deliberate or accidental corruptions of transactions and to recover the transactions in case such a data corruption incident occurs.

2 citations


Journal ArticleDOI
TL;DR: The Erasure Coding (EC) is utilized and leverage to propose a reliable storage correctness verification solution that guarantees the retrieval of evidence and minimizes the effect of server failure/unavailability, and results show that the proposed solution is highly efficient than well-known state-of-the-art verification schemes.
Abstract: Cloud storage services allow users to remotely store their data in a distributed environment and enjoy the cloud applications ubiquitously. To maximize users’ trust, it also integrates a verification mechanism that guarantees the stored data’s correctness. The storage application fragments the user data and stores them on multiple cloud storage servers. However, it suffers from expensive data aggregation computations while processing verification services, and inevitably poses a data integrity verification challenge. To avoid these expensive computations, we simplify the verification procedure without needing the data aggregation, just by storing the evidence fragments and data fragments across the datacenters. In distributed environments, the storage correctness verification mechanism depends on the availability of storage servers. Therefore, the challenge of proof/evidence availability may arise due to a server failure or data corruption, hence, decreasing the reliability of storage correctness verification. Thus, the problem of proof reliability is introduced over the distributed data. A few techniques proposed in the literature provide the data reliability; however, none of these existing works have considered the proof reliability to the best of our knowledge. To address the new issue of proof reliability, in this paper, we utilize and leverage the Erasure Coding (EC) to propose a reliable storage correctness verification solution that guarantees the retrieval of evidence and minimizes the effect of server failure/unavailability. The experimental results demonstrate that the proposed approach achieves reliability even after the loss of a certain number of fragments, ranging between 2 and 12 depending upon the number of parity fragments used in the EC scheme. Extensive experiments are performed in real-time, and results show that our proposed solution is highly efficient than well-known state-of-the-art verification schemes.

2 citations


Posted Content
TL;DR: In this paper, a man-in-the-middle-based attack scenario that intercepts process communication between control systems and field devices, employs false data injection techniques, and performs data corruption such as sending false commands to field devices is presented.
Abstract: With the increasing use of information and communication technology in electrical power grids, the security of energy supply is increasingly threatened by cyber-attacks. Traditional cyber-security measures, such as firewalls or intrusion detection/prevention systems, can be used as mitigation and prevention measures, but their effective use requires a deep understanding of the potential threat landscape and complex attack processes in energy information systems. Given the complexity and lack of detailed knowledge of coordinated, timed attacks in smart grid applications, we need information and insight into realistic attack scenarios in an appropriate and practical setting. In this paper, we present a man-in-the-middle-based attack scenario that intercepts process communication between control systems and field devices, employs false data injection techniques, and performs data corruption such as sending false commands to field devices. We demonstrate the applicability of the presented attack scenario in a physical smart grid laboratory environment and analyze the generated data under normal and attack conditions to extract domain-specific knowledge for detection mechanisms.

Proceedings ArticleDOI
17 Aug 2021
TL;DR: In this paper, a systematic experimental evaluation of the error detection and correcting scheme, which is suitable for complex network data hiding approaches, i.e., distributed network covert channels (DNCCs), was performed, which proved that the proposed solution guaranteed secret communication reliability even when faced with severe networking conditions up to 20% of data corruption while maintaining a stable covert data rate.
Abstract: Information hiding in communication networks is gaining recently increased attention from the security community. This is because such techniques are a double-edged sword that, on the one hand, can be used, e.g., to enhance the privacy of Internet users while on the other can be utilized by malware developers to enable a covert communication feature in malicious software. This means that to understand the risks that data hiding poses, it is of utmost importance to study the inner workings of potential information hiding methods and accompanying mechanisms (e.g., those that provide reliability of such communications) as well as to develop effective and efficient countermeasures. That is why, in this paper we perform a systematic experimental evaluation of the error detection and correcting scheme, which is suitable for complex network data hiding approaches, i.e., distributed network covert channels (DNCCs). The obtained results prove that the proposed solution guarantees secret communication reliability even when faced with severe networking conditions up to 20% of data corruption while maintaining a stable covert data rate.

Journal ArticleDOI
10 Apr 2021
TL;DR: Experimental results demonstrate that in terms of safeguarding the privacy, efficient and safe search for encrypted distributed documents the proposed system is better than existing.
Abstract: In the recent times, cloud storage tends to be a primary storage means for external data. Cloud defense of the data against attacks is the main challenge. Private or semi-private information growth has rapidly expanded over the information network; privacy safeguards have failed to address the search mechanisms. In the field of information networks, privacy protection is an important factor in carrying out various data mining operations with encrypted data stored in different storage systems. A tolerance and protection against data corruption mechanism should be developed which is difficult to achieve. Furthermore, as there is no adequate audit mechanism, the integrity of the stored data become questionable. In addition to this, the user authentication is another challenge. The current solution provides only a remote audit mechanism. It requires data owners to always remain online so that the auditing process is manually handled, which is sometimes unworkable. In this paper, we propose a new, regenerative, public audit methodology accompanied by third-party audits. The existing data search system provides one solution that can be used to maintain the confidentiality of indexing. Documents are stored on a private server in plain word form, which compromise the protection of privacy. So that this system is improved to make the document more secure and efficient, we first store the documents in encrypted form on server, and use the Key Distribution Center (KDC). To generate keys the KDC uses the user's biometric feature. In order to improve the search experience, we also implement TF-IDF, which provides an efficient evaluation of the results. Lastly, we carry out comprehensive data set experiments to evaluate our proposed system performance. Experimental results demonstrate that in terms of safeguarding the privacy, efficient and safe search for encrypted distributed documents the proposed system is better than existing. The methodology suggested also includes an auditing mechanism by third parties to ensure data integrity.


Posted Content
TL;DR: The authors proposed a diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities, and applied controlled corruption transformations to widely used benchmarks (MNLI and ANLI).
Abstract: Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences remains unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

Book ChapterDOI
01 Jan 2021
TL;DR: A hash function is a mathematical function used to identify and authenticate the file data as mentioned in this paper, which can be used to check the integrity of the data and ensure the validity of such evidence becomes crucial.
Abstract: A hash function is a mathematical function used to identify and authenticate the file data. Data corruption in forensic evidence will tamper the digital evidence and could mislead the investigation. To ensure information security, various identity and access management tools are necessary. Hashing plays a key role in such scenarios and can be used to check the integrity of the data. Ensuring the validity of such evidence becomes crucial. Validity can be performed at various stages, viz. collection, preservation, and at the time of analysis.


Proceedings Article
27 Jul 2021
TL;DR: In this article, the authors proposed robust principal component analysis (RPCA) which decomposes a data matrix into a low-rank and a sparse component, where the low rank component represents the principal components, while the sparse component accounts for the data corruption.
Abstract: It has long been known that principal component analysis (PCA) is not robust with respect to gross data corruption. This has been addressed by robust principal component analysis (RPCA). The first computationally tractable definition of RPCA decomposes a data matrix into a low-rank and a sparse component. The low-rank component represents the principal components, while the sparse component accounts for the data corruption. Previous works consider the corruption of individual entries or whole columns of the data matrix. In contrast, we consider a more general form of data corruption that affects groups of measurements. We show that the decomposition approach remains computationally tractable and allows the exact recovery of the decomposition when only the corrupted data matrix is given. Experiments on synthetic data corroborate our theoretical findings, and experiments on several real-world datasets from different domains demonstrate the wide applicability of our generalized approach.

Proceedings ArticleDOI
13 May 2021
TL;DR: In this paper, an algorithm for identifying a method for storing Earth remote sensing data is proposed, which involves converting a multispectral image into a one-dimensional digital signal, constructing a spectrum of the received signal, its threshold processing and analyzing the location of the detected peaks.
Abstract: This work is devoted to the development of an algorithm for recovering unidentified multispectral remote monitoring data. It is noted that during storage, transmission over a distance, preprocessing, Earth remote sensing data may be damaged. The most difficult case of data corruption involves corrupting header information. It is shown that Fourier analysis of the image presented in the row vector format can be used to recover the header information. The analysis of the main regularities of the Fourier transforms of such a signal for the cases of its storage in the BIL, BIP and BSQ formats is carried out. An algorithm for identifying a method for storing Earth remote sensing data is proposed, which involves converting a multispectral image into a one-dimensional digital signal, constructing a spectrum of the received signal, its threshold processing and analyzing the location of the detected peaks. The work of the algorithm is demonstrated by the example of determining the data storage format of the SPOT spacecraft for the case of processing a typical fragment. A study of the functioning of the algorithm when processing heterogeneous images is carried out and recommendations for its use are developed. The results of processing real remote monitoring data obtained by various space-based systems are presented.

Journal ArticleDOI
TL;DR: Picket as discussed by the authors uses a self-supervised deep learning model for mixed-type tabular data, which is called PicketNet, to detect corrupted data and remove corrupted data points from the training data.
Abstract: Data corruption is an impediment to modern machine learning deployments. Corrupted data can severely bias the learned model and can also lead to invalid inferences. We present, Picket, a simple framework to safeguard against data corruptions during both training and deployment of machine learning models over tabular data. For the training stage, Picket identifies and removes corrupted data points from the training data to avoid obtaining a biased model. For the deployment stage, Picket flags, in an online manner, corrupted query points to a trained machine learning model that due to noise will result in incorrect predictions. To detect corrupted data, Picket uses a self-supervised deep learning model for mixed-type tabular data, which we call PicketNet. To minimize the burden of deployment, learning a PicketNet model does not require any human-labeled data. Picket is designed as a plugin that can increase the robustness of any machine learning pipeline. We evaluate Picket on a diverse array of real-world data considering different corruption models that include systematic and adversarial noise during both training and testing. We show that Picket consistently safeguards against corrupted data during both training and deployment of various models ranging from SVMs to neural networks, beating a diverse array of competing methods that span from data quality validation models to robust outlier detection models.

Book ChapterDOI
01 Jan 2021
TL;DR: In this paper, the details of different models of information and data security and also the management of risk and different challenges is possessed by security and different threats and also importance of security.
Abstract: Information security and data security are the processes that are mainly based on preventing information from unauthorized access or in other words to secure information and data corruption throughout its lifecycle be it the information that is stored as physical format such as files which are subject to thefts or vandalism, etc. Since the very beginning of communication, everyone understood it was important to have a mechanism to preserve the private information. In the nineteenth century, more complex systems were made to let the government manage the information effectively. During the First World War, multi-tier systems were introduced to communicate info from one place to another. This study shows the details of different models of information and data security and also the management of risk and different challenges is possessed by security and different threats and also the importance of security.