scispace - formally typeset
Search or ask a question

Showing papers on "Data Corruption published in 2017"


Journal ArticleDOI
TL;DR: A two-phase MC-based data recovery scheme, named MC-Two-Phase, which applies the matrix completion technique to fully exploit the inherent features of environmental data to recover the data matrix due to either data missing or corruption is proposed.
Abstract: Affected by hardware and wireless conditions in WSNs, raw sensory data usually have notable data loss and corruption. Existing studies mainly consider the interpolation of random missing data in the absence of the data corruption. There is also no strategy to handle the successive missing data. To address these problems, this paper proposes a novel approach based on matrix completion (MC) to recover the successive missing and corrupted data. By analyzing a large set of weather data collected from 196 sensors in Zhu Zhou, China, we verify that weather data have the features of low-rank, temporal stability, and spatial correlation. Moreover, from simulations on the real weather data, we also discover that successive data corruption not only seriously affects the accuracy of missing and corrupted data recovery but even pollutes the normal data when applying the matrix completion in a traditional way. Motivated by these observations, we propose a novel Principal Component Analysis (PCA)-based scheme to efficiently identify the existence of data corruption. We further propose a two-phase MC-based data recovery scheme, named MC-Two-Phase, which applies the matrix completion technique to fully exploit the inherent features of environmental data to recover the data matrix due to either data missing or corruption. Finally, the extensive simulations with real-world sensory data demonstrate that the proposed MC-Two-Phase approach can achieve very high recovery accuracy in the presence of successively missing and corrupted data.

103 citations


Proceedings Article
27 Feb 2017
TL;DR: It is found that modern distributed systems do not consistently use redundancy to recover from file- system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability.
Abstract: We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous bugs related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. Our results have implications for the design of next generation fault-tolerant distributed and cloud storage systems.

57 citations


Journal ArticleDOI
TL;DR: This paper model and optimize users’ data protection policy in which sensitive data are partitioned into several blocks to enhance data security and multiple replicas are further created for each block to provide data survivability in a cloud environment subject to the co-residence attacks.

37 citations


Journal ArticleDOI
TL;DR: This work makes original contributions by formulating and solving constrained optimization problems to balance the data theft and data corruption probabilities in cloud computing systems subject to co-resident attacks.

32 citations


Journal ArticleDOI
TL;DR: It is found that modern distributed systems do not consistently use redundancy to recover from file- system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability.
Abstract: We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous problems related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. We also find that the above outcomes arise due to fundamental problems in file-system fault handling that are common across many systems. Our results have implications for the design of next-generation fault-tolerant distributed and cloud storage systems.

22 citations


28 Jul 2017
TL;DR: Potential approaches to the storage and querying of Linked Data with varying degrees of decentralisation and guarantees of integrity, using distributed ledgers are described, and their a priori differences in performance, storage limitations and reliability are discussed.
Abstract: Distributed ledger platforms based on blockchains provide a fully distributed form of data storage which can guarantee data integrity Certain use cases, such as medical applications, can benefit from guarantees that the results of arbitrary queries against a Linked Dataset faithfully represent its contents as originally published, without tampering or data corruption We describe potential approaches to the storage and querying of Linked Data with varying degrees of decentralisation and guarantees of integrity, using distributed ledgers, and discuss their a priori differences in performance, storage limitations and reliability, setting out a programme for future empirical research

19 citations


Journal ArticleDOI
01 Aug 2017
TL;DR: This paper addresses the coverage and efficiency problems of data cleaning by introducing CleanM (pronounced clean'em), a language which can express multiple types of cleaning operations and validated the applicability of CleanM on top of CleanDB, a newly designed and implemented framework which can query heterogeneous data.
Abstract: Data cleaning has become an indispensable part of data analysis due to the increasing amount of dirty data. Data scientists spend most of their time preparing dirty data before it can be used for data analysis. At the same time, the existing tools that attempt to automate the data cleaning procedure typically focus on a specific use case and operation. Still, even such specialized tools exhibit long running times or fail to process large datasets. Therefore, from a user's perspective, one is forced to use a different, potentially inefficient tool for each category of errors.This paper addresses the coverage and efficiency problems of data cleaning. It introduces CleanM (pronounced clean'em), a language which can express multiple types of cleaning operations. CleanM goes through a three-level translation process for optimization purposes; a different family of optimizations is applied in each abstraction level. Thus, CleanM can express complex data cleaning tasks, optimize them in a unified way, and deploy them in a scaleout fashion. We validate the applicability of CleanM by using it on top of CleanDB, a newly designed and implemented framework which can query heterogeneous data. When compared to existing data cleaning solutions, CleanDB a) covers more data corruption cases, b) scales better, and can handle cases for which its competitors are unable to terminate, and c) uses a single interface for querying and for data cleaning.

19 citations


Proceedings Article
01 Jan 2017
TL;DR: A new game theoretic data publication strategy is reported and its integration into the open source software ARX is evaluated, indicating that the implementation is scalable and can be combined with various data privacy risk and quality measures.
Abstract: Biomedical data continues to grow in quantity and quality, creating new opportunities for research and data-driven applications. To realize these activities at scale, data must be shared beyond its initial point of collection. To maintain privacy, healthcare organizations often de-identify data, but they assume worst-case adversaries, inducing high levels of data corruption. Recently, game theory has been proposed to account for the incentives of data publishers and recipients (who attempt to re-identify patients), but this perspective has been more hypothetical than practical. In this paper, we report on a new game theoretic data publication strategy and its integration into the open source software ARX. We evaluate our implementation with an analysis on the relationship between data transformation, utility, and efficiency for over 30,000 demographic records drawn from the U.S. Census Bureau. The results indicate that our implementation is scalable and can be combined with various data privacy risk and quality measures.

11 citations


Posted Content
TL;DR: Wang et al. as discussed by the authors proposed an online and distributed robust regression approach, both of which can concurrently address all the above challenges, including computational infeasibility of handling an entire dataset at once, existence of heterogeneously distributed corruption, and difficulty in corruption estimation when data cannot be entirely loaded.
Abstract: In today's era of big data, robust least-squares regression becomes a more challenging problem when considering the adversarial corruption along with explosive growth of datasets. Traditional robust methods can handle the noise but suffer from several challenges when applied in huge dataset including 1) computational infeasibility of handling an entire dataset at once, 2) existence of heterogeneously distributed corruption, and 3) difficulty in corruption estimation when data cannot be entirely loaded. This paper proposes online and distributed robust regression approaches, both of which can concurrently address all the above challenges. Specifically, the distributed algorithm optimizes the regression coefficients of each data block via heuristic hard thresholding and combines all the estimates in a distributed robust consolidation. Furthermore, an online version of the distributed algorithm is proposed to incrementally update the existing estimates with new incoming data. We also prove that our algorithms benefit from strong robustness guarantees in terms of regression coefficient recovery with a constant upper bound on the error of state-of-the-art batch methods. Extensive experiments on synthetic and real datasets demonstrate that our approaches are superior to those of existing methods in effectiveness, with competitive efficiency.

11 citations


Book ChapterDOI
25 Mar 2017
TL;DR: Security level has been increased on cloud platform though data stored on server is more secure and integrity is maintained throughout the cloud platform.
Abstract: Many studies have derived multiple ways to achieve security in the server and integrating the data in multiple servers by detecting the misbehavior in the server. The data is secured on server using encryption techniques before dividing into fragments before storing on virtual cloud. This study focuses different perspective of storing data on virtual cloud to maintain integrity by storing the fragments of address of data. Hence, the data remains secure and only the address of the data is transmitted when divided in fragments and data is secured with encryption, so it would be difficult for third party to decrypt and access on server. Thus, security level has been increased on cloud platform though data stored on server is more secure and integrity is maintained throughout the cloud platform.

6 citations


Proceedings ArticleDOI
08 Jun 2017
TL;DR: This paper presents the idea of dependency trees, which should help to identify error sources in the event of a fault, and shows emergent behavior which leads to dynamic decision-making processes that can change at runtime.
Abstract: CPS are interconnected systems that observe and manipulate real objects and processes. They allow dynamic extension and show emergent behavior which leads to dynamic decision-making processes that can change at runtime. They cannot always be easily understood because of the high number of components involved. If an error occurs in such a process, it is difficult to comprehend which component involved in the decision process is responsible for that error. The decision therefore has a high degree of dependency on the nodes involved in the process. Therefore, errors are not easily traceable to their original source. In this paper, we present the idea of dependency trees, which should help to identify error sources in the event of a fault.

01 Jan 2017
TL;DR: In this article, the authors analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors, and find that modern distributed systems do not consistently use redundancy to recover from file system faults.
Abstract: We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous bugs related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. Our results have implications for the design of next generation fault-tolerant distributed and cloud storage systems.

Proceedings ArticleDOI
02 Oct 2017
TL;DR: Online and distributed robust regression approaches are proposed that benefit from strong robustness guarantees in terms of regression coefficient recovery with a constant upper bound on the error of state-of-the-art batch methods.
Abstract: In today's era of big data, robust least-squares regression becomes a more challenging problem when considering the adversarial corruption along with explosive growth of datasets. Traditional robust methods can handle the noise but suffer from several challenges when applied in huge dataset including 1) computational infeasibility of handling an entire dataset at once, 2) existence of heterogeneously distributed corruption, and 3) difficulty in corruption estimation when data cannot be entirely loaded. This paper proposes online and distributed robust regression approaches, both of which can concurrently address all the above challenges. Specifically, the distributed algorithm optimizes the regression coefficients of each data block via heuristic hard thresholding and combines all the estimates in a distributed robust consolidation. Furthermore, an online version of the distributed algorithm is proposed to incrementally update the existing estimates with new incoming data. We also prove that our algorithms benefit from strong robustness guarantees in terms of regression coefficient recovery with a constant upper bound on the error of state-of-the-art batch methods. Extensive experiments on synthetic and real datasets demonstrate that our approaches are superior to those of existing methods in effectiveness, with competitive efficiency.

Proceedings ArticleDOI
17 Jul 2017
TL;DR: A forgery attack and a data corruption attack are proposed to demonstrate the insecurity of the proposed privacy preserving public auditing mechanism, and a modified scheme based on their mechanism is proposed.
Abstract: Auditing mechanisms are received much attentions from researchers due to the developing of cloud storage. Recently, Wang et al. proposed a privacy preserving public auditing mechanism for shared cloud data with supporting group dynamic. However, we find some security flaws in their mechanism. In this paper, a forgery attack and a data corruption attack are proposed to demonstrate the insecurity. Then, we propose a modified scheme based on their mechanism. The security analysis is demonstrated that our improvement can resist the mentioned two attacks.

Journal ArticleDOI
TL;DR: A novel orthogonal concatenated code and cyclic redundancy check have been used to mitigate the effects of data corruption in the user data and a novel memory management algorithm is proposed that helps to process the data at the back-end computing nodes removing the added path delays.
Abstract: Due to the dramatic increase of data volume in modern high energy physics (HEP) experiments, a robust high-speed data acquisition (DAQ) system is very much needed to gather the data generated during different nuclear interactions. As the DAQ works under harsh radiation environment, there is a fair chance of data corruption due to various energetic particles like alpha, beta, or neutron. Hence, a major challenge in the development of DAQ in the HEP experiment is to establish an error resilient communication system between front-end sensors or detectors and back-end data processing computing nodes. Here, we have implemented the DAQ using field-programmable gate array (FPGA) due to some of its inherent advantages over the application-specific integrated circuit. A novel orthogonal concatenated code and cyclic redundancy check (CRC) have been used to mitigate the effects of data corruption in the user data. Scrubbing with a 32-b CRC has been used against error in the configuration memory of FPGA. Data from front-end sensors will reach to the back-end processing nodes through multiple stages that may add an uncertain amount of delay to the different data packets. We have also proposed a novel memory management algorithm that helps to process the data at the back-end computing nodes removing the added path delays. To the best of our knowledge, the proposed FPGA-based DAQ utilizing optical link with channel coding and efficient memory management modules can be considered as first of its kind. Performance estimation of the implemented DAQ system is done based on resource utilization, bit error rate, efficiency, and robustness to radiation.

Proceedings ArticleDOI
01 Jun 2017
TL;DR: The enhanced study on cloud data services is presented, the security threats and its requirements to outsource the data to a cloud server are discussed, and the open challenges and current research directions in each category of solution to the data services are presented.
Abstract: Cloud computing is one of the greatest growing field for outsourcing data to cloud server. Among the rapid invention in cloud computing technologies, providing data services to the industry people, enterprises and individual in the cloud environment is a serious issue. Implementing security techniques in the cloud server is very complex task due to tremendous growth in cloud environment. These cloud servers are hosted on internet to perform operation such as read, write, store and organize the data rather than on desktop computer. While processing the data on cloud server under open network, business people faces massive security and privacy risks. The majority of the threat includes data leakage, data corruption and privacy preservation in cloud environments. In recent times, several studies have been proposed to addresses these threats and also provide solutions to enable the protection on the cloud data. In this paper, we present the enhanced study on cloud data services also we discuss the security threats and its requirements to outsource the data to a cloud server. Finally, we present the open challenges and current research directions in each category of solution to the data services.

Journal ArticleDOI
TL;DR: This research work is trying to make security data model in secure element of NFC secure against data corruption and eavesdropping using Daffier Hellman.
Abstract: Now a day’s smart phone have Near Field Communication (NFC), which can be used for transferring data, payment trough mobile gateway and automation. With this new contactless technology set to become an important part of our lives, people have some valid and understandable security concerns. NFC is inherently vulnerable to data corruption and eavesdropping attacks. In this research work, we are trying to make security data model in secure element of NFC. Our aim is securing against data corruption and eavesdropping using Daffier Hellman.


Journal ArticleDOI
TL;DR: This paper constructs a privacy-preserving auditing scheme that can audit cloud data during data migration, which can hugely reduce the migration cost when data corruption happens, and extends the scheme to support data update during auditing.
Abstract: Cloud storage has gained great attention in recent years. It brings many benefits as well as security issues. Cloud auditing technology can ensure the integrity of cloud users' data. However, it lacks efficiency when dealing with the migration scenario of a large amount of cloud data. In this paper, we propose an efficient cloud auditing scheme which supports data migration auditing. We first construct a privacy-preserving auditing scheme that can audit cloud data during data migration, which can hugely reduce the migration cost when data corruption happens. We further extend the scheme to support data update during auditing. By supporting batch auditing for multiple cloud users' migration auditing tasks, our scheme could hugely improve the efficiency. Performance analysis demonstrates that our auditing scheme is secure and efficient.

Patent
30 Aug 2017
TL;DR: In this article, the authors proposed a fast secure data destruction for NAND memory devices that renders data in a memory cell unreadable by performing only the pre-programming phase of the erase process.
Abstract: Disclosed in some examples are systems, methods, memory devices, and machine readable mediums for a fast secure data destruction for NAND memory devices that renders data in a memory cell unreadable. Instead of going through all the erase phases, the memory device may remove sensitive data by performing only the pre-programming phase of the erase process. Thus, the NAND doesn't perform the second and third phases of the erase process. This is much faster and results in data that cannot be reconstructed. In some examples, because the erase pulse is not actually applied and because this is simply a programming operation, data may be rendered unreadable at a per-page level rather than a per-block level as in traditional erases.

Book ChapterDOI
01 Jan 2017
TL;DR: This chapter proposes a secure provable data possession (SPDP) scheme for object storage systems, and through interactive verification and hierarchical structure optimizing, is able to greatly improve the security and efficiency of checking during the course of the verification of metadata.
Abstract: Object storage systems provide an ocean of space within which a very large quantity of data objects can be stored reliably, guaranteeing clients a prime opportunity to obtain big data. In this chapter, we propose a secure provable data possession (SPDP) scheme for object storage systems, so as to avoid the high-risk security issues of big data application. Through interactive verification and hierarchical structure optimizing, we are able to greatly improve the security and efficiency of checking during the course of the verification of metadata. In particular, a new secure protection strategy is presented for detecting data corruption and preventing data loss. Finally, we conduct a statistical experiment evaluation to test the performance of our strategy. The results of the experiments show that under the approved directories' optimization and protection strategy, our scheme can guarantee the object storage security and enhance the performance of storage data analysis. In this chapter, we emphasize the construction of security and efficiency of the object storage system and guarantee the ability of clients to achieve their resources flexibly. We first present our object storage system through composition analysis. Subsequently the framework of SPDP is described by using SPDP definition, verification algorithm, and hierarchical structure. By optimizing approved directories, the distributed metadata can be efficiently updated or revoked. Moreover, a novel secure protection strategy is proposed for detecting data corruption and preventing data loss. Our experiment results also validate the effectiveness of our strategy.

Book ChapterDOI
17 Dec 2017
TL;DR: An analysis model, named CDM (Critical Data Model), which can compute the critical of variables in the programs and achieve the purpose of reducing redundancy for the reliable program is proposed.
Abstract: In modern life, software plays an increasingly important role and ensuring the reliability of software is of particular importance. In space, a Single Event Upset occurs because of the strong radiation effects of cosmic rays, which can lead to errors in software. In order to guarantee the reliability of software, many software-based fault tolerance methods have been proposed. The majority of them are based on data redundancy, which duplicates all data to prevent data corruption during the software execution. But this fault tolerant approach will make the data redundant and increase memory overhead and time overhead. Duplicating critical variables only can significantly reduce the memory and performance overheads, while still guaranteeing very high reliable results in terms of fault-tolerance improvement. In this paper, we propose an analysis model, named CDM (Critical Data Model), which can compute the critical of variables in the programs and achieve the purpose of reducing redundancy for the reliable program. According to the experimental results, the model proposed in this paper can enhance the reliability of the software, reduce the time and memory cost, and improve the efficiency of the reliable program.

Patent
19 Jul 2017
TL;DR: In this article, an information handling system includes a first memory, a second memory, and a central processor, and the first memory includes a buffer to store uncorrected no action (UCNA) errors for the second memory.
Abstract: An information handling system includes a first memory, a second memory, and a central processor. The first memory includes a buffer to store uncorrected no action (UCNA) errors for the second memory. The central processor detects a memory data corruption in the second memory, stores a first UCNA error associated with the memory data corruption in the buffer implemented within the first memory, determines whether the buffer is full, and erases an oldest in time UCNA error from the buffer in response to the buffer being full.

Proceedings Article
Kyle Poore1
01 Jan 2017
TL;DR: The results show that while this method appears to be sensitive to motor noise, room reverberation and multipath effects, it has very low data corruption rates, which makes it suitable for use in some applications.
Abstract: We propose an alternative to Wi-Fi for robotic communication, as its increased use in a competition environment has lead to highly overlapping and interfering networks. This interference often causes unreliable transmission of data, which affects teams’ ability to coordinate complex behaviors. Our method uses fixed length Dual Tone Multi Frequency (DTMF) messages and uses a basic packet structure designed to reduce data corruption as a result of noise. We conducted twelve different experiments varying the distance between robots and message format, as well as whether the robots are walking or sitting silently. Methods for scheduling messages to avoid crosstalk were also developed and tested. The results show that while this method appears to be sensitive to motor noise, room reverberation and multipath effects, it has very low data corruption rates, which makes it suitable for use in some applications.

Patent
10 May 2017
TL;DR: In this paper, a disaster-tolerant repair method and device of a text database is presented, which mainly comprises the steps that whether data corruption exists in the database or not is detected; if the data corruption exist in the DB, disaster-to-repair is conducted on the database according to the situation whether the database is configured with a back-up disaster tolerant script.
Abstract: The invention provides a disaster-tolerant repair method and device of a text database. The method mainly comprises the steps that whether data corruption exists in the database or not is detected; if the data corruption exists in the database, disaster-tolerant repair is conducted on the database according to the situation whether the database is configured with a back-up disaster-tolerant script. By the adoption of the disaster-tolerant repair method, the problem that the prior art is lack of disaster-tolerant repair to the text database is solved without introduction of new resources and large-scale system cost increase, and the usability of the text database is improved.

Patent
11 Jan 2017
TL;DR: In this article, a processing method for asynchronous data consumption is presented, which includes the steps that after a first server successfully executes data consumption of a data item, a consumption success mail containing consumption data is sent to a second server and is stored in an appointed storage equipment.
Abstract: The invention discloses a processing method for asynchronous data consumption. The processing method includes the steps that after a first server successfully executes data consumption of a data item, a consumption success mail containing consumption data is sent to a second server and is stored in an appointed storage equipment; after receiving the consumption success mail containing the consumption data, the second server carries out corresponding service logic processing on the consumption data; if service logic processing is successful, the storage equipment is notified to cancel the consumption success mail; if the service logic processing is failing, a service logic processing failure mail containing the consumption data is returned to the first server, and the storage equipment is notified to cancel the consumption success mail; after receiving the service logic processing failure mail, the first server reimburses the consumption data corresponding to the data item. The probability of data corruption can be reduced, and human cost for error checking is reduced indirectly.

Patent
31 May 2017
TL;DR: In this article, a method for resolving storage disk fault tolerance is proposed, where the storage object in a storage system is divided into multiple data blocks which are redundant to one another, and each data block is provided with a data check code.
Abstract: The invention discloses a method for resolving storage disk fault tolerance The method includes: storing a storage objects by taking data blocks as units, wherein the storage object in a storage system is divided into multiple data blocks which are redundant to one another, and each data block is provided with a data check code; dispersing the redundant data blocks to different disks of different nodes, wherein the redundant data blocks are used for solving the problem of disk failure, and the check codes are used for resolving the problem of bit errors As compared with a traditional method that data are detected whether to be corrupted or not through data check codes, the method has the advantages that the problem about data corruption caused by bit errors of the disks can be effectively solved, and reliability and stability of the storage data are improved; since the bit errors of the data blocks occur in writing, reading and transmitting process of the data, the problem that the data errors cannot be restored due to data migration is effective avoided by splitting the data into the data blocks and the check codes are added to the data blocks, and duration limit in storage of the data is prolonged