scispace - formally typeset

Proceedings ArticleDOI

Distributed data recovery architecture based on schema segregation

01 Mar 2017-pp 1238-1243

TL;DR: This paper proposes a decentralized disaster recovery structure instead of a single centralized DR, i.e. the databases are hosted on multiple servers, where each of these servers is catering to one or more related schema.

AbstractThe growing importance of huge amount of data and the data analysis for business agility has prophesied managerial decisions to ensure data security and disaster recovery options. This growing dependency has triggered extensive storage of data and development of ‘high availability’ databases. One of the prevalent practices in the industry is to provide for a data center (DC) where the on-going service runs under normal circumstances and another server at Disaster Recovery Center (DR) that contains a replica of the database to provide continuity of service if the primary database fails due to a catastrophe or some other unavoidable reasons. Such situation may cause the database server to be damaged or result in a hardware malfunction. In this paper, we propose a decentralized disaster recovery structure instead of a single centralized DR, i.e. the databases are hosted on multiple servers, where each of these servers is catering to one or more related schema. Since we have multiple servers each catering for a different schema, we also propose different replication schemes for each of these schemas depending on the nature of updates and usage. In the proposed architecture, we would like to have a cyclic back up scheme to provide better decentralization and less vulnerable as opposed to a radial back up architectures.

...read more


Citations
More filters
Journal ArticleDOI
TL;DR: This review paper discusses Cassandra’s existing deletion mechanism and presents some identified issues related to backup and recovery in the Cassandra database, and several possible solutions to address Backup and recovery, including recovery in case of disasters, have been reviewed.
Abstract: Cassandra is a NoSQL database having a peer-to-peer, ring-type architecture. Cassandra offers fault-tolerance, data replication for higher availability as well as ensures no single point of failure. Given that Cassandra is a NoSQL database, it is evident that it lacks research that has gone into comparatively older and more widely and broadly used SQL databases. Cassandra’s growing popularity in recent times gives rise to the need to address any security-related or recovery-related concerns associated with its usage. This review paper discusses Cassandra’s existing deletion mechanism and presents some identified issues related to backup and recovery in the Cassandra database. Further, failure detection and handling of failures such as node failure or data center failure have been explored in the paper. In addition, several possible solutions to address backup and recovery, including recovery in case of disasters, have been reviewed.

1 citations


Cites background from "Distributed data recovery architect..."

  • ...A distributed computer system can provide more computing power, and hence parallel computation (Bhattacharya et al., 2017)....

    [...]


References
More filters
Book
01 Aug 1990
TL;DR: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
Abstract: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud computing and streaming data applications, has forced a renewal of interest in distributed and parallel data management, while, at the same time, requiring a rethinking of some of the traditional techniques. This book covers the breadth and depth of this re-emerging field. The coverage consists of two parts. The first part discusses the fundamental principles of distributed data management and includes distribution design, data integration, distributed query processing and optimization, distributed transaction management, and replication. The second part focuses on more advanced topics and includes discussion of parallel database systems, distributed object management, peer-to-peer data management, web data management, data stream systems, and cloud computing. New in this Edition: New chapters, covering database replication, database integration, multidatabase query processing, peer-to-peer data management, and web data management. Coverage of emerging topics such as data streams and cloud computing Extensive revisions and updates based on years of class testing and feedback Ancillary teaching materials are available.

2,328 citations


"Distributed data recovery architect..." refers background in this paper

  • ...Moreover subsections of these databases as a whole may form a distributed RDBMS and can provide function as a service equivalent to a centralized one in DC, when required – This concept is proposed in [7]....

    [...]

Book ChapterDOI
12 Nov 2001
TL;DR: A simulation framework that is developed to enable comparative studies of alternative dynamic replication strategies and preliminary results obtained with this simulator show that significant savings in latency and bandwidth can be obtained if the access patterns contain a small degree of geographical locality.
Abstract: Dynamic replication can be used to reduce bandwidth consumption and access latency in high performance "data grids" where users require remote access to large files. Different replication strategies can be defined depending on when, where, and how replicas are created and destroyed. We describe a simulation framework that we have developed to enable comparative studies of alternative dynamic replication strategies. We present preliminary results obtained with this simulator, in which we evaluate the performance of five different replication strategies for three different kinds of access patterns. The data in ths scenario is read-only and so there are no consistency issues involved. The simulation results show that significant savings in latency and bandwidth can be obtained if the access patterns contain a small degree of geographical locality.

493 citations

01 Jan 2001
TL;DR: A simulation framework that is developed to model a grid scenario, which enables comparative studies of alternative dynamic replication strategies for three different kinds of access patterns, and shows that the best strategy has significant savings in latency and bandwidth consumption if the access patterns contain a moderate amount of geographical locality.
Abstract: Physics experiments that generate large amounts of data need to be able to share it with researchers around the world. High performance grids facilitate the distribution of such data to geographically remote places. Dynamic replication can be used as a technique to reduce bandwidth consumption and access latency in accessing these huge amounts of data. We describe a simulation framework that we have developed to model a grid scenario, which enables comparative studies of alternative dynamic replication strategies. We present preliminary results obtained with this simulator, in which we evaluate the performance of six different replication strategies for three different kinds of access patterns. The simulation results show that the best strategy has significant savings in latency and bandwidth consumption if the access patterns contain a moderate amount of geographical locality.

116 citations


"Distributed data recovery architect..." refers background in this paper

  • ...On this context it is pointed out that dynamic replication [9] can be used to reduce bandwidth consumption and access latency in high performance “data grids” where users require remote access to large files....

    [...]

Proceedings ArticleDOI
16 Aug 2007
TL;DR: A new dynamic replication strategy based on the principle of local optimization is proposed, taking into account two important issues which bound the replication: storage capability of different nodes and the bandwidth between these nodes.
Abstract: Efficient data access is one way of improving the performance of the data grid. In order to speed up the data access and reduce bandwidth consumption, data grid replicates essential data in multiple locations. This paper studies data replication strategy in data grid, taking into account two important issues which bound the replication: storage capability of different nodes and the bandwidth between these nodes. We propose a new dynamic replication strategy based on the principle of local optimization. The data grid can achieve the global data access optimization through the interaction of the local optimization in the local optimization areas.

28 citations


"Distributed data recovery architect..." refers background in this paper

  • ...Another research work considers the usage and updating rate of the data bases at the multiple sites and accordingly different replication rates [8] have been proposed to ensure consistent replication as well as faster execution....

    [...]

Proceedings ArticleDOI
02 Oct 2006
TL;DR: A novel algorithm for deterministic thread scheduling based on the interception of synchronisation statements based on shared data are protected by mutexes and client requests are sent to all replicas in total order, which is superior to other existing approaches.
Abstract: Determinism is mandatory for replicating distributed objects with strict consistency guarantees. Multithreaded execution of method invocations is a source of nondeterminism, but helps to improve performance and avoids deadlocks that nested invocations can cause in a single-threaded execution model. This paper contributes a novel algorithm for deterministic thread scheduling based on the interception of synchronisation statements. It assumes that shared data are protected by mutexes and client requests are sent to all replicas in total order; requests are executed concurrently as long as they do not issue potentially conflicting synchronisation operations. No additional communication is required for granting locks in a consistent order in all replicas. In addition to reentrant mutex locks, the algorithm supports condition variables and time-bounded wait operations. An experimental evaluation shows that, in some typical usage patterns of distributed objects, the algorithm is superior to other existing approaches.

27 citations


"Distributed data recovery architect..." refers methods in this paper

  • ...The Strategies for handling multiple threads in replicated objects [11] have been discussed and have presented the novel ADETS-MAT algorithm for deterministic thread scheduling....

    [...]