scispace - formally typeset
Search or ask a question

Showing papers on "Consistency (database systems) published in 2011"


Journal ArticleDOI
06 May 2011
TL;DR: This paper examines a number of SQL and socalled "NoSQL" data stores designed to scale simple OLTP-style application loads over many servers, and contrasts the new systems on their data model, consistency mechanisms, storage mechanisms, durability guarantees, availability, query support, and other dimensions.
Abstract: In this paper, we examine a number of SQL and socalled "NoSQL" data stores designed to scale simple OLTP-style application loads over many servers. Originally motivated by Web 2.0 applications, these systems are designed to scale to thousands or millions of users doing updates as well as reads, in contrast to traditional DBMSs and data warehouses. We contrast the new systems on their data model, consistency mechanisms, storage mechanisms, durability guarantees, availability, query support, and other dimensions. These systems typically sacrifice some of these dimensions, e.g. database-wide transaction consistency, in order to achieve others, e.g. higher availability and scalability.

1,412 citations


Journal ArticleDOI
TL;DR: It is found that methods based on the degree-corrected stochastic block model are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not.
Abstract: Community detection is a fundamental problem in network analysis, with applications in many diverse areas. The stochastic block model is a common tool for model-based community detection, and asymptotic tools for checking consistency of community detection under the block model have been recently developed. However, the block model is limited by its assumption that all nodes within a community are stochastically equivalent, and provides a poor fit to networks with hubs or highly varying node degrees within communities, which are common in practice. The degree-corrected stochastic block model was proposed to address this shortcoming and allows variation in node degrees within a community while preserving the overall block community structure. In this paper we establish general theory for checking consistency of community detection under the degree-corrected stochastic block model and compare several community detection criteria under both the standard and the degree-corrected models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice. We find that methods based on the degree-corrected block model, which includes the standard block model as a special case, are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not. On the other hand, in practice, the degree correction involves estimating many more parameters, and empirically we find it is only worth doing if the node degrees within communities are indeed highly variable. We illustrate the methods on simulated networks and on a network of political blogs.

294 citations


Journal ArticleDOI
TL;DR: A concept called behavioral profile is introduced that captures the essential behavioral constraints of a process model and is used for the definition of a formal notion of consistency which is less sensitive to model projections than common criteria of behavioral equivalence and allows for quantifying deviation in a metric way.
Abstract: Engineering of process-driven business applications can be supported by process modeling efforts in order to bridge the gap between business requirements and system specifications. However, diverging purposes of business process modeling initiatives have led to significant problems in aligning related models at different abstract levels and different perspectives. Checking the consistency of such corresponding models is a major challenge for process modeling theory and practice. In this paper, we take the inappropriateness of existing strict notions of behavioral equivalence as a starting point. Our contribution is a concept called behavioral profile that captures the essential behavioral constraints of a process model. We show that these profiles can be computed efficiently, i.e., in cubic time for sound free-choice Petri nets w.r.t. their number of places and transitions. We use behavioral profiles for the definition of a formal notion of consistency which is less sensitive to model projections than common criteria of behavioral equivalence and allows for quantifying deviation in a metric way. The derivation of behavioral profiles and the calculation of a degree of consistency have been implemented to demonstrate the applicability of our approach. We also report the findings from checking consistency between partially overlapping models of the SAP reference model.

224 citations


Journal ArticleDOI
TL;DR: In this paper, the PMC-Index enables policy-makers and researchers to identify the level of consistency as well as the strengths and weaknesses within any policy modeling, which is a way of classifying and a method of evaluating policy modeling.

200 citations


Journal ArticleDOI
TL;DR: An automated approach for detecting and tracking inconsistencies in real time (while the model changes) that is quick, correct, scalable, fully automated, and easy to use as it does not require any special skills from the engineers using it.
Abstract: Software models typically contain many inconsistencies and consistency checkers help engineers find them. Even if engineers are willing to tolerate inconsistencies, they are better off knowing about their existence to avoid follow-on errors and unnecessary rework. However, current approaches do not detect or track inconsistencies fast enough. This paper presents an automated approach for detecting and tracking inconsistencies in real time (while the model changes). Engineers only need to define consistency rules-in any language-and our approach automatically identifies how model changes affect these consistency rules. It does this by observing the behavior of consistency rules to understand how they affect the model. The approach is quick, correct, scalable, fully automated, and easy to use as it does not require any special skills from the engineers using it. We evaluated the approach on 34 models with model sizes of up to 162,237 model elements and 24 types of consistency rules. Our empirical evaluation shows that our approach requires only 1.4 ms to reevaluate the consistency of the model after a change (on average); its performance is not noticeably affected by the model size and common consistency rules but only by the number of consistency rules, at the expense of a quite acceptable, linearly increasing memory consumption.

135 citations


Proceedings ArticleDOI
30 Mar 2011
TL;DR: It is demonstrated that a Paxos-based replicated state machine implementing a storage service can achieve performance close to the limits of the underlying hardware while tolerating arbitrary machine restarts, some permanent machine or disk failures and a limited set of Byzantine faults.
Abstract: Conventional wisdom holds that Paxos is too expensive to use for high-volume, high-throughput, data-intensive applications. Consequently, fault-tolerant storage systems typically rely on special hardware, semantics weaker than sequential consistency, a limited update interface (such as append-only), primary-backup replication schemes that serialize all reads through the primary, clock synchronization for correctness, or some combination thereof. We demonstrate that a Paxos-based replicated state machine implementing a storage service can achieve performance close to the limits of the underlying hardware while tolerating arbitrary machine restarts, some permanent machine or disk failures and a limited set of Byzantine faults. We also compare it with two versions of primary-backup. The replicated state machine can serve as the data store for a file system or storage array. We present a novel algorithm for ensuring read consistency without logging, along with a sketch of a proof of its correctness.

127 citations


Proceedings ArticleDOI
06 Jun 2011
TL;DR: This work addresses two important problems related to the consistency properties in a history of operations on a read/write register, and investigates two quantities: one is the staleness of the reads and the commonality of violations.
Abstract: Motivated by the increasing popularity of eventually consistent key-value stores as a commercial service, we address two important problems related to the consistency properties in a history of operations on a read/write register (i.e., the start time, finish time, argument, and response of every operation). First, we consider how to detect a consistency violation as soon as one happens. To this end, we formulate a specification for online verification algorithms, and we present such algorithms for several well-known consistency properties. Second, we consider how to quantify the severity of the violations, if a history is found to contain consistency violations. We investigate two quantities: one is the staleness of the reads, and the other is the commonality of violations. For staleness, we further consider time-based staleness and operation-count-based staleness. We present efficient algorithms that compute these quantities. We believe that addressing these problems helps both key-value store providers and users adopt data consistency as an important aspect of key-value store offerings.

118 citations


Journal ArticleDOI
TL;DR: An approach based on answer set programming to check the consistency of large-scale data sets and extends this methodology to provide explanations for inconsistencies by determining minimal representations of conflicts.
Abstract: We introduce an approach to detecting inconsistencies in large biological networks by using answer set programming. To this end, we build upon a recently proposed notion of consistency between biochemical/genetic reactions and high-throughput profiles of cell activity. We then present an approach based on answer set programming to check the consistency of large-scale data sets. Moreover, we extend this methodology to provide explanations for inconsistencies by determining minimal representations of conflicts. In practice, this can be used to identify unreliable data or to indicate missing reactions.

98 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: A novel method is developed for achieving multi-label multi-instance image annotation, where image-level (bag-level) labels and region- level (instance- level) labels are both obtained and the associations between semantic concepts and visual features are mined both at the image level and at the region level.
Abstract: In this paper, each image is viewed as a bag of local regions, as well as it is investigated globally. A novel method is developed for achieving multi-label multi-instance image annotation, where image-level (bag-level) labels and region-level (instance-level) labels are both obtained. The associations between semantic concepts and visual features are mined both at the image level and at the region level. Inter-label correlations are captured by a co-occurence matrix of concept pairs. The cross-level label coherence encodes the consistency between the labels at the image level and the labels at the region level. The associations between visual features and semantic concepts, the correlations among the multiple labels, and the cross-level label coherence are sufficiently leveraged to improve annotation performance. Structural max-margin technique is used to formulate the proposed model and multiple interrelated classifiers are learned jointly. To leverage the available image-level labeled samples for the model training, the region-level label identification on the training set is firstly accomplished by building the correspondences between the multiple bag-level labels and the image regions. JEC distance based kernels are employed to measure the similarities both between images and between regions. Experimental results on real image datasets MSRC and Corel demonstrate the effectiveness of our method.

85 citations


Journal ArticleDOI
01 Jan 2011-Database
TL;DR: A community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore is proposed to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards.
Abstract: The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.

84 citations


Journal ArticleDOI
TL;DR: A framework that enables balancing consistency and expert judgment is proposed and a linearization process for streamlining the trade-off between expert reliability and synthetic consistency is developed.

Journal ArticleDOI
TL;DR: It is shown how a 3D city model can be decomposed into such components which are either topologically equivalent to a disk, a sphere, or a torus, enabling the modeling of the terrain, of buildings and other constructions, and of bridges and tunnels.
Abstract: Consistency is a crucial prerequisite for a large number of relevant applications of 3D city models, which have become more and more important in GIS. Users need efficient and reliable consistency checking tools in order to be able to assess the suitability of spatial data for their applications. In this paper we provide the theoretical foundations for such tools by defining an axiomatic characterization of 3D city models. These axioms are effective and efficiently supported by recent spatial database management systems and methods of Computational Geometry or Computer Graphics. They are equivalent to the topological concept of the 3D city model presented in this paper, thereby guaranteeing the reliability of the method. Hence, each error is detected by the axioms, and each violation of the axioms is in fact an error. This property, which is proven formally, is not guaranteed by existing approaches. The efficiency of the method stems from its locality: in most cases, consistency checks can safely be restricted to single components, which are defined topologically. We show how a 3D city model can be decomposed into such components which are either topologically equivalent to a disk, a sphere, or a torus, enabling the modeling of the terrain, of buildings and other constructions, and of bridges and tunnels, which are handles from a mathematical point of view. This enables a modular design of the axioms by defining axioms for each topological component and for the aggregation of the components. Finally, a sound, consistent concept for aggregating features, i.e. semantical objects like buildings or rooms, to complex features is presented.

Patent
18 Mar 2011
TL;DR: A multi-user, elastic, on-demand, distributed relational database management system is described in this article, where the database is fragmented into distributed objects called atoms, and any change to a copy of an atom at one location is replicated to all other locations containing an atom.
Abstract: A multi-user, elastic, on-demand, distributed relational database management system The database is fragmented into distributed objects called atoms Any change to a copy of an atom at one location is replicated to all other locations containing a copy of that atom Transactional managers operate to satisfy the properties of atomicity, consistency, isolation and durability

Proceedings ArticleDOI
21 May 2011
TL;DR: This work describes a technique and support tool that allows for semi-automated checking of natural language and semi-formal requirements models, supporting both consistency management between representations but also correctness and completeness analysis.
Abstract: Requirements specifications need to be checked against the 3C's - Consistency, Completeness and Correctness - in order to achieve high quality. This is especially difficult when working with both natural language requirements and associated semi-formal modelling representations. We describe a technique and support tool that allows us to perform semi-automated checking of natural language and semi-formal requirements models, supporting both consistency management between representations but also correctness and completeness analysis. We use a concept of essential use case interaction patterns to perform the correctness and completeness analysis on the semi-formal representation. We highlight potential inconsistencies, incompleteness and incorrectness using visual differencing in our support tool. We have evaluated our approach via an end user study which focused on the tool's usefulness, ease of use, ease of learning and user satisfaction and provided data for cognitive dimensions of notations analysis of the tool.

Journal ArticleDOI
TL;DR: A new temporal-dependency based checkpoint selection strategy which can select checkpoints in accordance with different temporal constraints is developed which can improve the efficiency of overall temporal verification significantly over the existing representative strategies.
Abstract: In a scientific workflow system, a checkpoint selection strategy is used to select checkpoints along scientific workflow execution for verifying temporal constraints so that we can identify any temporal violations and handle them in time in order to ensure overall temporal correctness of the execution that is often essential for the usefulness of execution results. The problem of existing representative strategies is that they do not differentiate temporal constraints as, once a checkpoint is selected, they verify all temporal constraints. However, such a checkpoint does not need to be taken for those constraints whose consistency can be deduced from others. The corresponding verification of such constraints is consequently unnecessary and can severely impact overall temporal verification efficiency while the efficiency determines whether temporal violations can be identified quickly for handling in time. To address the problem, in this article, we develop a new temporal-dependency based checkpoint selection strategy which can select checkpoints in accordance with different temporal constraints. With our strategy, the corresponding unnecessary verification can be avoided. The comparison and experimental simulation further demonstrate that our new strategy can improve the efficiency of overall temporal verification significantly over the existing representative strategies.

Journal Article
TL;DR: This paper formalises asynchronous object replication, either state based or oper- ation based, and provides a sufficient condition appropriate for each case, and describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs and monotonic DAGs.
Abstract: Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or oper- ation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs and monotonic DAGs. It discusses some properties needed to implement non-trivial CRDTs.

Journal ArticleDOI
TL;DR: A lower bound of Ω(t) is proved on the number of writes needed in order to implement a read-only transaction of t items, which successfully terminates in a disjoint-access parallel TM implementation, which assumes strict serializability and thus hold under the assumption of opacity.
Abstract: Transactional memory (TM) is a popular approach for alleviating the difficulty of programming concurrent applications; TM guarantees that a transaction, consisting of a sequence of operations, appear to be executed atomically. Two fundamental properties of TM implementations are disjoint-access parallelism and the invisibility of read operations. Disjoint access parallelism ensures that operations on disconnected data do not interfere, and thus it is critical for TM scalability. The invisibility of read operations means that their implementation does not write to the memory, thereby reducing memory contention. This paper proves an inherent tradeoff for implementations of transactional memories: they cannot be both disjoint-access parallel and have read-only transactions that are invisible and always terminate successfully. In fact, a lower bound of Ω(t) is proved on the number of writes needed in order to implement a read-only transaction of t items, which successfully terminates in a disjoint-access parallel TM implementation. The results assume strict serializability and thus hold under the assumption of opacity. It is shown how to extend the results to hold also for weaker consistency conditions, snapshot isolation and serializability.

Proceedings ArticleDOI
09 Sep 2011
TL;DR: An initial assessment through simulation shows the benefits of the proposed version consistency of distributed transactions as a safe criterion for dynamic reconfiguration with respect to timeliness and low degree of disruption.
Abstract: There is an increasing demand for the runtime reconfiguration of distributed systems in response to changing environments and evolving requirements. Reconfiguration must be done in a safe and low-disruptive way. In this paper, we propose version consistency of distributed transactions as a safe criterion for dynamic reconfiguration. Version consistency ensures that distributed transactions be served as if there were operating on a single coherent version of the system despite possible reconfigurations that may happen meanwhile. The paper also proposes a distributed algorithm to maintain dynamic dependences between components at architectural level and enable low-disruptive version-consistent dynamic reconfigurations. An initial assessment through simulation shows the benefits of the proposed approach with respect to timeliness and low degree of disruption.

Journal Article
TL;DR: Techniques for the computation of causal behavioural profiles using structural decomposition techniques for sound free-choice workflow systems if unstructured net fragments are acyclic or can be traced back to S- or T-nets are introduced.
Abstract: Analysis of behavioural consistency is an important aspect of software engineering. In process and service management, consistency verification of behavioural models has manifold applications. For instance, a business process model used as system specification and a corresponding workflow model used as implementation have to be consistent. Another example would be the analysis to what degree a process log of executed business operations is consistent with the corresponding normative process model. Typically, existing notions of behaviour equivalence, such as bisimulation and trace equivalence, are applied as consistency notions. Still, these notions are exponential in computation and yield a Boolean result. In many cases, however, a quantification of behavioural deviation is needed along with concepts to isolate the source of deviation. In this article, we propose causal behavioural profiles as the basis for a consistency notion. These profiles capture essential behavioural information, such as order, exclusiveness, and causality between pairs of activities of a process model. Consistency based on these profiles is weaker than trace equivalence, but can be computed efficiently for a broad class of models. In this article, we introduce techniques for the computation of causal behavioural profiles using structural decomposition techniques for sound free-choice workflow systems if unstructured net fragments are acyclic or can be traced back to S- or T-nets. We also elaborate on the findings of applying our technique to three industry model collections.

Journal ArticleDOI
TL;DR: A community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore is proposed to provide a general overview of the database landscape to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards.
Abstract: The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.

Proceedings ArticleDOI
16 Oct 2011
TL;DR: In this article, a formal synchronization framework with bidirectional update propagation operations is presented, which is generated from a triple graph grammars (TGGs) which specifies the language of all consistently integrated source and target models.
Abstract: Triple graph grammars (TGGs) have been used successfully to analyze correctness and completeness of bidirectional model transformations, but a corresponding formal approach to model synchronization has been missing. This paper closes this gap by providing a formal synchronization framework with bidirectional update propagation operations. They are generated from a TGG, which specifies the language of all consistently integrated source and target models. As a main result, we show that the generated synchronization framework is correct and complete, provided that forward and backward propagation operations are deterministic. Correctness essentially means that the propagation operations preserve consistency. Moreover, we analyze the conditions under which the operations are inverse to each other. All constructions and results are motivated and explained by a small running example using concrete visual syntax and abstract syntax notation based on typed attributed graphs.

Journal ArticleDOI
TL;DR: The paper shows that rather than being something to be avoided, inconsistent heuristics often add a diversity of heuristic values into a search which can lead to a reduction in the number of node expansions, contrary to the common perception in the AI literature.

Proceedings ArticleDOI
30 Jun 2011
TL;DR: EMFMigrate a comprehensive approach to the metamodel co-evolution problem is proposed and a prospective and unifying characterization is given with the intent of clarifying the main difficulties and outline the basic requirements for possible solutions.
Abstract: Metamodels can be considered one of the cardinal concepts of Model-Driven Engineering, one which a number of coordinated entities such as models, transformations and tools, are dependent on. Analogously to any software artifact, metamodels are equally prone to evolution during their lifetime. As a consequence, whenever a metamodel changes, any related entity must be consistently adapted for preserving its wellformedness, consistency, or intrinsic correctness.This paper discusses the problem of co-adapting models, transformations, and tools. Different aspects are taken into account and a prospective and unifying characterization is given with the intent of clarifying the main difficulties and outline the basic requirements for possible solutions. In this respect, EMFMigrate a comprehensive approach to the metamodel co-evolution problem is proposed.

Patent
30 Jun 2011
TL;DR: In this paper, a method and a system for replaying full scale production database workload using Network or Kernel Capture is presented, where the captured workload is then pre-processed and Replayed to a test system along with full transactional integrity.
Abstract: The present invention relates to a method and a system for replaying full scale Production Database workload using Network or Kernel Capture. In one embodiment, the capture of the Server workload is done using Network Capture or using Kernel drivers. The captured workload is then pre-processed and Replayed to a test system along with full transactional integrity.

Proceedings Article
Lukasz Golab1, Theodore Johnson1
01 Jan 2011
TL;DR: This paper develops a theory of temporal consistency for stream warehouses that allows for multiple consistency levels and shows how to restrict query answers to a given consistency level and how warehouse maintenance can be optimized using knowledge of the consistency levels required by materialized views.
Abstract: A stream warehouse is a Data Stream Management System (DSMS) that stores a very long history, e.g. years or decades; or equivalently a data warehouse that is continuously loaded. A stream warehouse enables queries that seamlessly range from realtime alerting and diagnostics to long-term data mining. However, continuously loading data from many different and uncontrolled sources into a real-time stream warehouse introduces a new consistency problem: users want results in as timely a fashion as possible, but “stable” results often require lengthy synchronization delays. In this paper we develop a theory of temporal consistency for stream warehouses that allows for multiple consistency levels. We show how to restrict query answers to a given consistency level and we show how warehouse maintenance can be optimized using knowledge of the consistency levels required by materialized views.

Proceedings ArticleDOI
10 Nov 2011
TL;DR: This paper describes an approach for the automatic generation of quick fixes for DSMLs, taking a set of domain-specific constraints and model manipulation policies as input, and relies on statespace exploration techniques to find sequences of operations that decrease the number of inconsistencies.
Abstract: Domain-specific modeling languages (DSML) proved to be an important asset in creating powerful design tools for domain experts. Although these tools are capable of preserving the syntax-correctness of models even during free-hand editing, they often lack the ability of maintaining model consistency for complex language-specific constraints. Hence, there is a need for a tool-level automatism to assist DSML users in resolving consistency violation problems. In this paper, we describe an approach for the automatic generation of quick fixes for DSMLs, taking a set of domain-specific constraints and model manipulation policies as input. The computation relies on statespace exploration techniques to find sequences of operations that decrease the number of inconsistencies. Our approach is illustrated using a BPMN case study, and it is evaluated by several experiments to show its feasibility and performance.

Book ChapterDOI
12 Dec 2011
TL;DR: CacheGenie provides high-level caching abstractions for common query patterns in web applications based on Object-Relational Mapping (ORM) frameworks and shows that it takes little effort for application developers to use CacheGenie, and that it improves throughput by 2--2.5x for read-mostly workloads in Pinax.
Abstract: Caching is an important technique in scaling storage for high-traffic web applications. Usually, building caching mechanisms involves significant effort from the application developer to maintain and invalidate data in the cache. In this work we present CacheGenie, a caching middleware which makes it easy for web application developers to use caching mechanisms in their applications. CacheGenie provides high-level caching abstractions for common query patterns in web applications based on Object-RelationalMapping (ORM) frameworks. Using these abstractions, the developer does not have to worry about managing the cache (e.g., insertion and deletion) or maintaining consistency (e.g., invalidation or updates) when writing application code.We design and implement CacheGenie in the popular Django web application framework, with PostgreSQL as the database backend and memcached as the caching layer. To automatically invalidate or update cached data, we use triggers inside the database. CacheGenie requires no modifications to PostgreSQL or memcached. To evaluate our prototype, we port several Pinax web applications to use our caching abstractions. Our results show that it takes little effort for application developers to use CacheGenie, and that CacheGenie improves throughput by 2-2.5× for read-mostly workloads in Pinax.

Patent
25 Jul 2011
TL;DR: In this article, the authors present a method for processing a stream of events at a local device by correcting the out-of-order events based on the set of operators defined by the stream of operators.
Abstract: The claimed subject matter provides a method for processing a stream of events. The method includes receiving a stream of events at a local device. The stream of events is associated with the local device. Further, the stream of events includes one or more out-of-order events. The method also includes executing a first complex event processing query against the stream of events. The stream of events is processed based on multiple levels of consistency defined by a set of operators. Additionally, the method includes correcting the out-of-order events based on the set of operators. A first output is generated in which consistency is guaranteed based on the corrected out-of-order events. The method also includes sending the first output to a server that performs complex event processing on the output.

Journal ArticleDOI
01 Aug 2011
TL;DR: In this article, causal behavioural profiles are proposed as the basis for a consistency notion. But, they do not capture essential behavioral information, such as order, exclusiveness, and causality between pairs of activities of a process model.
Abstract: Analysis of behavioural consistency is an important aspect of software engineering. In process and service management, consistency verification of behavioural models has manifold applications. For instance, a business process model used as system specification and a corresponding workflow model used as implementation have to be consistent. Another example would be the analysis to what degree a process log of executed business operations is consistent with the corresponding normative process model. Typically, existing notions of behaviour equivalence, such as bisimulation and trace equivalence, are applied as consistency notions. Still, these notions are exponential in computation and yield a Boolean result. In many cases, however, a quantification of behavioural deviation is needed along with concepts to isolate the source of deviation. In this article, we propose causal behavioural profiles as the basis for a consistency notion. These profiles capture essential behavioural information, such as order, exclusiveness, and causality between pairs of activities of a process model. Consistency based on these profiles is weaker than trace equivalence, but can be computed efficiently for a broad class of models. In this article, we introduce techniques for the computation of causal behavioural profiles using structural decomposition techniques for sound free-choice workflow systems if unstructured net fragments are acyclic or can be traced back to S- or T-nets. We also elaborate on the findings of applying our technique to three industry model collections.

Journal ArticleDOI
TL;DR: A novel metadata distribution policy, Dynamic Dir-Grain (DDG), which seeks to balance the requirements of keeping namespace locality and even distribution of the load by dynamic partitioning of the namespace into size-adjustable hierarchical units and greatly reduces fail-free execution overheads.
Abstract: Most supercomputers nowadays are based on large clusters, which call for sophisticated, scalable, and decentralized metadata processing techniques. From the perspective of maximizing metadata throughput, an ideal metadata distribution policy should automatically balance the namespace locality and even distribution without manual intervention. None of existing metadata distribution schemes is designed to make such a balance. We propose a novel metadata distribution policy, Dynamic Dir-Grain (DDG), which seeks to balance the requirements of keeping namespace locality and even distribution of the load by dynamic partitioning of the namespace into size-adjustable hierarchical units. Extensive simulation and measurement results show that DDG policies with a proper granularity significantly outperform traditional techniques such as the Random policy and the Subtree policy by 40 percent to 62 times. In addition, from the perspective of file system reliability, metadata consistency is an equally important issue. However, it is complicated by dynamic metadata distribution. Metadata consistency of cross-metadata server operations cannot be solved by traditional metadata journaling on each server. While traditional two-phase commit (2PC) algorithm can be used, it is too costly for distributed file systems. We proposed a consistent metadata processing protocol, S2PC-MP, which combines the two-phase commit algorithm with metadata processing to reduce overheads. Our measurement results show that S2PC-MP not only ensures fast recovery, but also greatly reduces fail-free execution overheads.