scispace - formally typeset
Search or ask a question
Journal ArticleDOI

CLOTHO: directed test generation for weakly consistent database systems

10 Oct 2019-Vol. 3, pp 1-28
TL;DR: This paper presents a novel testing framework for detecting serializability violations in (SQL) database-backed Java applications executing on weakly-consistent storage systems and is the first automated test generation facility for identifyingserializability anomalies of Java applications intended to operate in geo-replicated distributed environments.
Abstract: Relational database applications are notoriously difficult to test and debug. Concurrent execution of database transactions may violate complex structural invariants that constraint how changes to the contents of one (shared) table affect the contents of another. Simplifying the underlying concurrency model is one way to ameliorate the difficulty of understanding how concurrent accesses and updates can affect database state with respect to these sophisticated properties. Enforcing serializable execution of all transactions achieves this simplification, but it comes at a significant price in performance, especially at scale, where database state is often replicated to improve latency and availability. To address these challenges, this paper presents a novel testing framework for detecting serializability violations in (SQL) database-backed Java applications executing on weakly-consistent storage systems. We manifest our approach in a tool, CLOTHO, that combines a static analyzer and model checker to generate abstract executions, discover serializability violations in these executions, and translate them back into concrete test inputs suitable for deployment in a test environment. To the best of our knowledge, CLOTHO, is the first automated test generation facility for identifying serializability anomalies of Java applications intended to operate in geo-replicated distributed environments. An experimental evaluation on a set of industry-standard benchmarks demonstrates the utility of our approach.
Citations
More filters
Journal ArticleDOI
15 Oct 2021
TL;DR: MonkeyDB as mentioned in this paper is a mock storage system for testing storage-backed applications under multiple isolation levels, and it uses a logical specification of the isolation level to compute, on a read operation, the set of all possible return values.
Abstract: Modern applications, such as social networking systems and e-commerce platforms are centered around using large-scale storage systems for storing and retrieving data. In the presence of concurrent accesses, these storage systems trade off isolation for performance. The weaker the isolation level, the more behaviors a storage system is allowed to exhibit and it is up to the developer to ensure that their application can tolerate those behaviors. However, these weak behaviors only occur rarely in practice and outside the control of the application, making it difficult for developers to test the robustness of their code against weak isolation levels. This paper presents MonkeyDB, a mock storage system for testing storage-backed applications. MonkeyDB supports a key-value interface as well as SQL queries under multiple isolation levels. It uses a logical specification of the isolation level to compute, on a read operation, the set of all possible return values. MonkeyDB then returns a value randomly from this set. We show that MonkeyDB provides good coverage of weak behaviors, which is complete in the limit. We test a variety of applications for assertions that fail only under weak isolation. MonkeyDB is able to break each of those assertions in a small number of attempts.

7 citations

Proceedings ArticleDOI
30 Mar 2020
TL;DR: A novel framework that automates the full evaluation process, including the failure injection, and emphasizes reproducibility is introduced that shows that distributed DBMS are not necessary available even if sufficient replicas are available and clients can experience significant downtimes.
Abstract: Cloud resources have become a preferred operational model distributed Database Management Systems (DBMS) by offering the elasticity and virtually unlimited scalability, but increase the risk of failures with increasing cluster sizes. While distributed DBMS provide high-availability mechanisms, it is currently an open research question to what extent they are able to provide availability and performance guarantees in case of cloud resource failures. Especially as existing DBMS benchmarks do not consider availability. We present a comprehensive methodology for evaluating the availability of distributed DBMS in case of cloud resource failures. Based on this methodology, we introduce a novel framework that automates the full evaluation process, including the failure injection, and emphasizes reproducibility. The framework is validated by 16 diverse availability evaluations. The results show that distributed DBMS are not necessary available even if sufficient replicas are available and clients can experience significant downtimes.

7 citations


Cites background from "CLOTHO: directed test generation fo..."

  • ...Therefore, we do not discuss related approaches such as analysing the availability on the DBMS level [24] or evaluating the consistency impact on client and DBMS level [5]....

    [...]

Proceedings ArticleDOI
19 Jun 2021
TL;DR: In this article, the authors present a sound fully-automated schema refactoring procedure that refactors a program's data layout to eliminate statically identified concurrency bugs, allowing more transactions to be safely executed under weaker and more performant database guarantees.
Abstract: Serializability is a well-understood concurrency control mechanism that eases reasoning about highly-concurrent database programs. Unfortunately, enforcing serializability has a high performance cost, especially on geographically distributed database clusters. Consequently, many databases allow programmers to choose when a transaction must be executed under serializability, with the expectation that transactions would only be so marked when necessary to avoid serious concurrency bugs. However, this is a significant burden to impose on developers, requiring them to (a) reason about subtle concurrent interactions among potentially interfering transactions, (b) determine when such interactions would violate desired invariants, and (c) then identify the minimum number of transactions whose executions should be serialized to prevent these violations. To mitigate this burden, this paper presents a sound fully-automated schema refactoring procedure that refactors a program’s data layout – rather than its concurrency control logic – to eliminate statically identified concurrency bugs, allowing more transactions to be safely executed under weaker and more performant database guarantees. Experimental results over a range of realistic database benchmarks indicate that our approach is highly effective in eliminating concurrency bugs, with safe refactored programs showing an average of 120% higher throughput and 45% lower latency compared to a serialized baseline.

3 citations

Proceedings ArticleDOI
10 Oct 2022
TL;DR: DT2 as mentioned in this paper is an approach for automatically testing transaction implementations in DBMSs, which can support complex features, e.g., various database schemas and cross-table queries.
Abstract: Database Management Systems (DBMSs) utilize transactions to ensure the consistency and integrity of data. Incorrect transaction implementations in DBMSs can lead to severe consequences, e.g., incorrect database states and query results. Therefore, it is critical to ensure the reliability of transaction implementations. In this paper, we propose DT2, an approach for automatically testing transaction implementations in DBMSs. We first randomly generate a database and a group of concurrent transactions operating the database, which can support complex features in DBMSs, e.g., various database schemas and cross-table queries. We then leverage differential testing to compare transaction execution results on multiple DBMSs to find discrepancies. The non-determinism of concurrent transactions can affect the effectiveness of our method. Therefore, we propose a transaction test protocol to ensure the deterministic execution of concurrent transactions. We evaluate DT2 on three widely-used MySQL-compatible DBMSs: MySQL, MariaDB and TiDB. In total, we have detected 10 unique transaction bugs and 88 transaction-related compatibility issues from the observed discrepancies. Our empirical study on these compatibility issues shows that DBMSs suffer from various transaction-related compatibility issues, although they claim that they are compatible. These compatibility issues can also lead to serious consequences, e.g., inconsistent database states among DBMSs.

3 citations

Book ChapterDOI
21 Jul 2020
TL;DR: In this paper, the authors propose appropriate semantics and specifications for highly-concurrent libraries in a weakly-consistent, replicated setting, and use these specifications to develop a static analysis framework that can automatically detect correctness violations of library implementations parameterized with respect to the different consistency policies provided by the underlying system.
Abstract: Geo-replicated systems provide a number of desirable properties such as globally low latency, high availability, scalability, and built-in fault tolerance. Unfortunately, programming correct applications on top of such systems has proven to be very challenging, in large part because of the weak consistency guarantees they offer. These complexities are exacerbated when we try to adapt existing highly-performant concurrent libraries developed for shared-memory environments to this setting. The use of these libraries, developed with performance and scalability in mind, is highly desirable. But, identifying a suitable notion of correctness to check their validity under a weakly consistent execution model has not been well-studied, in large part because it is problematic to naively transplant criteria such as linearizability that has a useful interpretation in a shared-memory context to a distributed one where the cost of imposing a (logical) global ordering on all actions is prohibitive. In this paper, we tackle these issues by proposing appropriate semantics and specifications for highly-concurrent libraries in a weakly-consistent, replicated setting. We use these specifications to develop a static analysis framework that can automatically detect correctness violations of library implementations parameterized with respect to the different consistency policies provided by the underlying system. We use our framework to analyze the behavior of a number of highly non-trivial library implementations of stacks, queues, and exchangers. Our results provide the first demonstration that automated correctness checking of concurrent libraries in a weakly geo-replicated setting is both feasible and practical.

3 citations

References
More filters
Proceedings ArticleDOI
22 May 1995
TL;DR: It is shown that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered, and new phenomena that better characterize isolation types are introduced.
Abstract: ANSI SQL-92 [MS, ANSI] defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered. Ambiguity in the statement of the phenomena is investigated and a more formal statement is arrived at; in addition new phenomena that better characterize isolation types are introduced. Finally, an important multiversion isolation type, called Snapshot Isolation, is defined.

1,086 citations


"CLOTHO: directed test generation fo..." refers background in this paper

  • ...For each benchmark, we initially limited the length of cycles to be at most 4, as all the canonical serializability anomalies, e.g. dirty reads and lost updates are of this length [Berenson et al. 1995]....

    [...]

Journal ArticleDOI
TL;DR: Several efficiently recognizable subclasses of the class of senahzable histories are introduced and it is shown how these results can be extended to far more general transaction models, to transactions with partly interpreted functions, and to distributed database systems.
Abstract: A sequence of interleaved user transactions in a database system may not be ser:ahzable, t e, equivalent to some sequential execution of the individual transactions Using a simple transaction model, it ~s shown that recognizing the transaction histories that are serlahzable is an NP-complete problem. Several efficiently recognizable subclasses of the class of senahzable histories are therefore introduced; most of these subclasses correspond to senahzabdity principles existing in the hterature and used in practice Two new principles that subsume all previously known ones are also proposed Necessary and sufficient conditions are given for a class of histories to be the output of an efficient history scheduler, these conditions imply that there can be no efficient scheduler that outputs all of senahzable histories, and also that all subclasses of senalizable histories studied above have an efficient scheduler Finally, it is shown how these results can be extended to far more general transaction models, to transactions with partly interpreted functions, and to distributed database systems

1,028 citations


"CLOTHO: directed test generation fo..." refers background in this paper

  • ...In addition to the above challenges, the high computational cost of detecting serializability anomalies at runtime[Papadimitriou 1979] means that practical testing methods often leverage a set of user-provided application-level invariants and assertions to check for serializability failureinduced…...

    [...]

  • ...te, the chance of randomly encountering conflicts of this kind becomes even smaller. In addition to the above challenges, the high computational cost of detecting serializability anomalies at runtime[Papadimitriou 1979] means that practical testing methods often leverage a set of user-provided application-level invariants and assertions to check for serializability failureinducedviolations.But,theseassertionsareoft...

    [...]

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This work focuses on the fundamental noninterference property of atomicity and presents a dynamic analysis for detecting atomicity violations, which combines ideas from both Lipton 's theory of reduction and earlier dynamic race detectors such as Eraser.
Abstract: Ensuring the correctness of multithreaded programs is difficult, due to the potential for unexpected interactions between concurrent threads. Much previous work has focused on detecting race conditions, but the absence of race conditions does not by itself prevent undesired thread interactions. We focus on the more fundamental non-interference property of atomicity; a method is atomic if its execution is not affected by and does not interfere with concurrently-executing threads. Atomic methods can be understood according to their sequential semantics, which significantly simplifies (formal and informal) correctness arguments.This paper presents a dynamic analysis for detecting atomicity violations. This analysis combines ideas from both Lipton's theory of reduction and earlier dynamic race detectors. Experience with a prototype checker for multithreaded Java code demonstrates that this approach is effective for detecting errors due to unintended interactions between threads. In particular, our atomicity checker detects errors that would be missed by standard race detectors, and it produces fewer false alarms on benign races that do not cause atomicity violations. Our experimental results also indicate that the majority of methods in our benchmarks are atomic, supporting our hypothesis that atomicity is a standard methodology in multithreaded programming.

428 citations

Proceedings ArticleDOI
20 Oct 2006
TL;DR: An innovative concurrent-program invariant that captures programmers' atomicity assumptions is proposed and a tool with two implementations is described that can automatically extract such invariants and detect atomicity violation bugs.
Abstract: Concurrency bugs are among the most difficult to test and diagnose of all software bugs. The multicore technology trend worsens this problem. Most previous concurrency bug detection work focuses on one bug subclass, data races, and neglects many other important ones such as atomicity violations, which will soon become increasingly important due to the emerging trend of transactional memory models.This paper proposes an innovative, comprehensive, invariantbased approach called AVIO to detect atomicity violations. Our idea is based on a novel observation called access interleaving invariant, which is a good indication of programmers' assumptions about the atomicity of certain code regions. By automatically extracting such invariants and detecting violations of these invariants at run time, AVIO can detect a variety of atomicity violations.Based on this idea, we have designed and built two implementations of AVIO and evaluated the trade-offs between them. The first implementation, AVIO-S, is purely in software, while the second, AVIO-H, requires some simple extensions to the cache coherence hardware. AVIO-S is cheaper and more accurate but incurs much higher overhead and thus more run-time perturbation than AVIOH. Therefore, AVIO-S is more suitable for in-house bug detection and postmortem bug diagnosis, while AVIO-H can be used for bug detection during production runs.We evaluate both implementations of AVIO using large realworld server applications (Apache and MySQL) with six representative real atomicity violation bugs, and SPLASH-2 benchmarks. Our results show that AVIO detects more tested atomicity violations of various types and has 25 times fewer false positives than previous solutions on average.

406 citations


"CLOTHO: directed test generation fo..." refers background in this paper

  • ...…of developers’ atomicity assumption for code regions in shared memory programs have been recently introduced where certain conflicting access patterns have been deemed safe due to lack of data flow between the read and write operations involved in such conflicts [Lu et al. 2006; Lucia et al. 2010]....

    [...]

Journal ArticleDOI
TL;DR: A theory is developed that characterizes when nonserializable executions of applications can occur under Snapshot Isolation, and it is applied to demonstrate that the TPC-C benchmark application has no serialization anomalies under SI, and how this demonstration can be generalized to other applications.
Abstract: Snapshot Isolation (SI) is a multiversion concurrency control algorithm, first described in Berenson et al. [1995]. SI is attractive because it provides an isolation level that avoids many of the common concurrency anomalies, and has been implemented by Oracle and Microsoft SQL Server (with certain minor variations). SI does not guarantee serializability in all cases, but the TPC-C benchmark application [TPC-C], for example, executes under SI without serialization anomalies. All major database system products are delivered with default nonserializable isolation levels, often ones that encounter serialization anomalies more commonly than SI, and we suspect that numerous isolation errors occur each day at many large sites because of this, leading to corrupt data sometimes noted in data warehouse applications. The classical justification for lower isolation levels is that applications can be run under such levels to improve efficiency when they can be shown not to result in serious errors, but little or no guidance has been offered to application programmers and DBAs by vendors as to how to avoid such errors. This article develops a theory that characterizes when nonserializable executions of applications can occur under SI. Near the end of the article, we apply this theory to demonstrate that the TPC-C benchmark application has no serialization anomalies under SI, and then discuss how this demonstration can be generalized to other applications. We also present a discussion on how to modify the program logic of applications that are nonserializable under SI so that serializability will be guaranteed.

351 citations