scispace - formally typeset
Journal ArticleDOI

Application Checkpointing Technique for Self-Healing From Failures in Mobile Grid Computing

01 Apr 2019-Vol. 11, Iss: 2, pp 50-62
TL;DR: A checkpointing based failure handling technique is proposed which will improve arrangement reliability and failure recovery time for the MG network and is tested on a grid of ubiquitously available Android-based mobile phones.
Abstract: A mobile grid (MG) consists of interconnected mobile devices which are used for high performance computing. Fault tolerance is an important property of mobile computational grid systems for achieving superior arrangement reliability and faster recovery from failures. Since the failure of the resources affects task execution fatally, fault tolerance service is essential to achieve QoS requirement in MG. The faults which occur in MG are link failure, node failure, task failure, limited bandwidth etc. Detecting these failures can help in better utilisation of the resources and timely notification to the user in a MG environment. These failures result in loss of computational results and data. Many algorithms or techniques were proposed for failure handling in traditional grids. The authors propose a checkpointing based failure handling technique which will improve arrangement reliability and failure recovery time for the MG network. Experimentation was conducted by creating a grid of ubiquitously available Android-based mobile phones.
Citations
More filters

Journal ArticleDOI
TL;DR: The analysis results revealed that the scientific literature published on IoT during the period had grown exponentially, with an approximately 48% growth rate in the last two years of the study period.
Abstract: This research was carried out using the bibliometric method to thematically analyze the articles on IoT in the Web of Science with Hierarchical Agglomerative Clustering approach. First, the descriptors of the related articles published from 2002 to 2016 were extracted from WoS, by conducting a keyword search using the “Internet of Things” keyword. Data analysis and clustering were carried out in SPSS, UCINET, and PreMap. The analysis results revealed that the scientific literature published on IoT during the period had grown exponentially, with an approximately 48% growth rate in the last two years of the study period (i.e. 2015 and 2016). After analyzing the themes of the documents, the resulting concepts were classified into twelve clusters. The twelve main clusters included: Privacy and Security, Authentication and Identification, Computing, Standards and Protocols, IoT as a component, Big Data, Architecture, Applied New Techniques in IoT, Application, Connection and Communication Tools, Wireless Network Protocols, and Wireless Sensor Networks.

2 citations


References
More filters

Journal ArticleDOI
01 Aug 2006-
TL;DR: The MPICH-V project focuses on designing, implementing and comparing several automatic fault-tolerant protocols for MPI applications, covering a large spectrum of known approaches from coordinated checkpoint, to uncoordinated checkpoint associated with causal message logging.
Abstract: High performance computing platforms such as Clusters, Grid and Desktop Grids are becoming larger and subject to more frequent failures. MPI is one of the most used message passing libraries in HPC applications. These two trends raise the need for fault-tolerant MPI. The MPICH-V project focuses on designing, implementing and comparing several automatic fault-tolerant protocols for MPI applications. We present an extensive related work section highlighting the originality of our approach and the proposed protocols. We then present four fault-tolerant protocols implemented in a new generic framework for fault-tolerant protocol comparison, covering a large spectrum of known approaches from coordinated checkpoint, to uncoordinated checkpoint associated with causal message logging. We measure the performance of these protocols on a micro-benchmark and compare them with the NAS benchmark, using an original fault tolerance test. Finally, we outline the lessons learned from this in depth fault-tolerant protocol comparison of MPI applications.

136 citations


Posted Content
TL;DR: This survey provides an overview of various fault-tolerance techniques developed to improve the robustness of supercomputing applications in the presence of failures.
Abstract: Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.

107 citations


Journal ArticleDOI
TL;DR: An efficient scheme to implement optimistic logging for the mobile computing environment with a small failure-free overhead and the cost of unnecessary rollback caused by the imprecise dependency is adjustable by properly selecting the logging frequency.
Abstract: A number of checkpointing and message logging algorithms have been proposed to support fault tolerance of mobile computing systems. However, little attention has been paid to the optimistic message logging scheme. Optimistic logging has a lower failure-free operation cost compared to other logging schemes. It also has a lower failure recovery cost compared to the checkpointing schemes. This paper presents an efficient scheme to implement optimistic logging for the mobile computing environment. In the proposed scheme, the task of logging is assigned to the mobile support station so that volatile logging can be utilized. In addition, to reduce the message overhead, the mobile support station takes care of dependency tracking and the potential dependency between mobile hosts is inferred from the dependency between mobile support stations. The performance of the proposed scheme is evaluated by an extensive simulation study. The results show that the proposed scheme requires a small failure-free overhead and the cost of unnecessary rollback caused by the imprecise dependency is adjustable by properly selecting the logging frequency.

48 citations


Journal ArticleDOI
TL;DR: A factorial analysis found that the environment was a key component behind changes in weight but its contribution was mitigated by structural properties of the population, which suggests that particular patterns of social ties at the micro-level are involved in making populations more resilient to change and less influenced by the environment.
Abstract: The influence of social networks on the development of obesity has been demonstrated, and several models have been proposed. However, these models are limited since they consider obesity as a ‘contagious’ phenomenon that can be caught if most social contacts are deemed obese. Furthermore, social networks were proposed as a means to mitigate the obesity epidemic, but the interaction of social networks with environmental factors could not yet be explored as it was not accounted for in the current models. We propose a new model of obesity to face these limitations. In our model, individuals influence each other with respect to food intake and physical activity, which may lead to changes depending on the environment, and will impact energy balance and weight. We illustrate the potential of our model via two questions: could we focus on social networks and neglect environmental sources of influence, at least from a modelling viewpoint? Are some social structures less prone to be influenced by their environment? We performed a factorial analysis based on both synthetic and real-world social networks, and found that the environment was a key component behind changes in weight but its contribution was mitigated by structural properties of the population. Furthermore, the contribution of the environment was not dictated by macro-level properties (small-world and scale-free), which suggests that particular patterns of social ties at the micro-level are involved in making populations more resilient to change and less influenced by the environment.

45 citations


Proceedings ArticleDOI
26 Mar 2007-
TL;DR: An asynchronous consistent global checkpoint collection algorithm which prevents contention for network storage at the file server and hence reduces the checkpointing overhead is presented.
Abstract: In this paper, we present an asynchronous consistent global checkpoint collection algorithm which prevents contention for network storage at the file server and hence reduces the checkpointing overhead. The algorithm has two phases: In the first phase, a process initiates consistent global checkpoint collection by saving its state tentatively and asynchronously (called tentative checkpoint) in local memory or remote stable storage if there is no contention for stable storage while saving the state; in the second phase, the message log associated with the tentative checkpoint is stored in stable storage (checkpoint finalization phase). The tentative checkpoint together with the associated message log stored in the stable storage becomes part of a consistent global checkpoint. Under our algorithm, two or more processes can concurrently initiate consistent global checkpoint collection. Every tentative checkpoint will be finalized successfully unless a failure occurs. The finalized checkpoints of each process is assigned a unique sequence number in ascending order. Finalized checkpoints with same sequence number form a consistent global checkpoint.

30 citations


Network Information
Related Papers (5)
Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20201