scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Scalable Pattern Sharing on Event Streams

14 Jun 2016-pp 495-510
TL;DR: The SPASS optimizer identifies opportunities for effective shared processing among CEP queries by leveraging time-based event correlations among queries and finds a shared pattern plan in polynomial-time covering all sequence patterns while still guaranteeing an optimality bound.
Abstract: Complex Event Processing (CEP) has emerged as a technology of choice for high performance event analytics in time-critical decision-making applications. Yet it is becoming increasingly difficult to support high-performance event processing due to the rising number and complexity of event pattern queries and the increasingly high velocity of event streams. In this work we design the SPASS framework that successfully tackles these demanding CEP workloads. Our SPASS optimizer identifies opportunities for effective shared processing among CEP queries by leveraging time-based event correlations among queries. The problem of pattern sharing is shown to be NP-hard by reducing the Minimum Substring Cover problem to our CEP pattern sharing problem. The SPASS optimizer is designed that finds a shared pattern plan in polynomial-time covering all sequence patterns while still guaranteeing an optimality bound. To execute this shared pattern plan, the SPASS runtime employs stream transactions that assure concurrent shared maintenance and re-use of sub-patterns across queries. Our experimental study confirms that the SPASS framework achieves over 16 fold performance improvement for a wide range of experiments compared to the state-of-the-art solution.
Citations
More filters
Journal ArticleDOI
01 Jan 2020
TL;DR: This survey elaborates on the whole pipeline from the time CER queries are expressed in the most prominent languages, to algorithmic toolkits for scaling-out CER to clustered and geo-distributed architectural settings.
Abstract: The concept of event processing is established as a generic computational paradigm in various application fields. Events report on state changes of a system and its environment. Complex event recognition (CER) refers to the identification of composite events of interest, which are collections of simple, derived events that satisfy some pattern, thereby providing the opportunity for reactive and proactive measures. Examples include the recognition of anomalies in maritime surveillance, electronic fraud, cardiac arrhythmias and epidemic spread. This survey elaborates on the whole pipeline from the time CER queries are expressed in the most prominent languages, to algorithmic toolkits for scaling-out CER to clustered and geo-distributed architectural settings. We also highlight future research directions.

80 citations

Journal ArticleDOI
TL;DR: In this article, a survey of the state of the art in stream processing parallelization and elasticity is presented, which is necessary to consolidate the state-of-the-art and to plan future research directions on this basis.
Abstract: Stream Processing (SP) has evolved as the leading paradigm to process and gain value from the high volume of streaming data produced, e.g., in the domain of the Internet of Things. An SP system is a middleware that deploys a network of operators between data sources, such as sensors, and the consuming applications. SP systems typically face intense and highly dynamic data streams. Parallelization and elasticity enable SP systems to process these streams with continuous high quality of service. The current research landscape provides a broad spectrum of methods for parallelization and elasticity in SP. Each method makes specific assumptions and focuses on particular aspects. However, the literature lacks a comprehensive overview and categorization of the state of the art in SP parallelization and elasticity, which is necessary to consolidate the state of the research and to plan future research directions on this basis. Therefore, in this survey, we study the literature and develop a classification of current methods for both parallelization and elasticity in SP systems.

68 citations

Journal ArticleDOI
TL;DR: A hierarchical fog-cloud computing CEP architecture for personalized service to accelerate response time and reduce resource waste is proposed and Experimental result shows that FogCepCare is superior to the traditional IoT-based healthcare application.
Abstract: With the development of medical sensors and IoT, personalized service assisted elder and patient living is a critical service in IoT-based healthcare application. However, the scale and complexity of personalized service is increasing because of ubiquitous deployment of various kinds of medical sensors, which cause response time increase and resource waste. Therefore, leveraging the advantage of complex event processing (CEP) in data stream processing, we propose a hierarchical fog-cloud computing CEP architecture for personalized service to accelerate response time and reduce resource waste. Firstly, we introduce the proposed architecture, which includes sensor layer, fog layer and cloud layer. Secondly, we propose a series of optimizations for the architecture, there are a partitioning and clustering approach and a communication and parallel processing policy to optimize the fog and cloud computing. Finally, we implement a prototype system based on the architecture named FogCepCare. Experimental result shows that FogCepCare is superior to the traditional IoT-based healthcare application.

55 citations


Cites background from "Scalable Pattern Sharing on Event S..."

  • ...Much research on the optimization of complex event pattern that adopts the distributed and parallel technology is emergency [10-18]....

    [...]

  • ...SPASS [18] optimizer identifies opportunities for effective shared processing among CEP queries by leveraging time-based event correlations among queries....

    [...]

Journal ArticleDOI
01 Jul 2018
TL;DR: It is formally proved that the CEP Plan Generation problem is equivalent to the Join Query Plan Generationproblem for a restricted class of patterns and can be reduced to it for a considerably wider range of classes, which implies the NP-completeness of the Cep Plan generation problem.
Abstract: Complex event processing (CEP) is a prominent technology used in many modern applications for monitoring and tracking events of interest in massive data streams. CEP engines inspect real-time information flows and attempt to detect combinations of occurrences matching predefined patterns. This is done by combining basic data items, also called "primitive events", according to a pattern detection plan, in a manner similar to the execution of multi-join queries in traditional data management systems. Despite this similarity, little work has been done on utilizing existing join optimization methods to improve the performance of CEP-based systems.In this paper, we provide the first theoretical and experimental study of the relationship between these two research areas. We formally prove that the CEP Plan Generation problem is equivalent to the Join Query Plan Generation problem for a restricted class of patterns and can be reduced to it for a considerably wider range of classes. This result implies the NP-completeness of the CEP Plan Generation problem. We further show how join query optimization techniques developed over the last decades can be adapted and utilized to provide practically efficient solutions for complex event detection. Our experiments demonstrate the superiority of these techniques over existing strategies for CEP optimization in terms of throughput, latency, and memory consumption.

29 citations


Cites background or methods from "Scalable Pattern Sharing on Event S..."

  • ...This problem can be solved by applying known multiquery techniques [17, 35, 43, 44, 54]....

    [...]

  • ...Advanced methods were also proposed for multi-query CEP optimization [17, 35, 43, 44, 54]....

    [...]

Proceedings ArticleDOI
25 Jun 2019
TL;DR: This paper presents a novel framework for real-time multi-pattern complex event processing based on formulating the above task as a global optimization problem and applying a combination of sharing and pattern reordering techniques to construct an optimal plan satisfying the problem constraints.
Abstract: Rapid advances in data-driven applications over recent years have intensified the need for efficient mechanisms capable of monitoring and detecting arbitrarily complex patterns in massive data streams. This task is usually performed by complex event processing (CEP) systems. CEP engines are required to process hundreds or even thousands of user-defined patterns in parallel under tight real-time constraints. To enhance the performance of this crucial operation, multiple techniques have been developed, utilizing well-known optimization approaches such as pattern rewriting and sharing common subexpressions. However, the scalability of these methods is limited by the high computation overhead, and the quality of the produced plans is compromised by ignoring significant parts of the solution space. In this paper, we present a novel framework for real-time multi-pattern complex event processing. Our approach is based on formulating the above task as a global optimization problem and applying a combination of sharing and pattern reordering techniques to construct an optimal plan satisfying the problem constraints. To the best of our knowledge, no such fusion was previously attempted in the field of CEP optimization. To locate the best possible evaluation plan in the resulting hyperexponential solution space, we design efficient local search algorithms that utilize the unique problem structure. An extensive theoretical and empirical analysis of our system demonstrates its superiority over state-of-the-art solutions.

26 citations


Cites background or methods from "Scalable Pattern Sharing on Event S..."

  • ...SPASS [54] selects the subpatterns to share according to a metric called ‘redundancy ratio’....

    [...]

  • ...Pattern sharing methods [7, 22, 45, 54, 64] utilize the structural similarities between different patterns to unify the processing of common subexpressions....

    [...]

  • ...Instead, related research efforts focused on utilizing heuristic approaches [56], dynamic programming [7, 49], local-ratio approximation algorithms [54], and branch-and-bound methods [28, 64] for similar optimization problems with hyperexponential solution spaces....

    [...]

  • ...We repeated the experiments summarized in Figures 10 and 11 for the basic sharing and the basic reordering methods, as well as for two recent state-of-the-art MCEP methods [54, 64]....

    [...]

  • ...Numerous advanced methods have been proposed for intra-pattern (sharing of subexpressions inside a nested pattern) [44, 55] and inter-pattern scenarios (sharing between different patterns) [7, 22, 45, 54, 64]....

    [...]

References
More filters
Proceedings Article
01 Jan 2003
TL;DR: The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams and leverages the PostgreSQL open source code base.
Abstract: Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, query processors based on adaptive dataflow will be necessary. The Telegraph project has developed a suite of novel technologies for continuously adaptive query processing. The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams. In this paper, we describe the system architecture and its underlying technology, and report on our ongoing implementation effort, which leverages the PostgreSQL open source code base. We also discuss open issues and our research agenda.

1,248 citations

Journal ArticleDOI
16 May 2000
TL;DR: The design of NiagaraCQ system is presented, some experimental results on the system's performance and scalability are given and other techniques including incremental evaluation of continuous queries, use of both pull and push models for detecting heterogeneous data source changes, and memory caching are employed.
Abstract: Continuous queries are persistent queries that allow users to receive new results when they become available. While continuous query systems can transform a passive web into an active environment, they need to be able to support millions of queries due to the scale of the Internet. No existing systems have achieved this level of scalability. NiagaraCQ addresses this problem by grouping continuous queries based on the observation that many web queries share similar structures. Grouped queries can share the common computation, tend to fit in memory and can reduce the I/O cost significantly. Furthermore, grouping on selection predicates can eliminate a large number of unnecessary query invocations. Our grouping technique is distinguished from previous group optimization approaches in the following ways. First, we use an incremental group optimization strategy with dynamic re-grouping. New queries are added to existing query groups, without having to regroup already installed queries. Second, we use a query-split scheme that requires minimal changes to a general-purpose query engine. Third, NiagaraCQ groups both change-based and timer-based queries in a uniform way. To insure that NiagaraCQ is scalable, we have also employed other techniques including incremental evaluation of continuous queries, use of both pull and push models for detecting heterogeneous data source changes, and memory caching. This paper presents the design of NiagaraCQ system and gives some experimental results on the system's performance and scalability.

1,162 citations

Proceedings ArticleDOI
27 Jun 2006
TL;DR: This paper proposes a complex event language that significantly extends existing event languages to meet the needs of a range of RFID-enabled monitoring applications and describes a query plan-based approach to efficiently implementing this language.
Abstract: In this paper, we present the design, implementation, and evaluation of a system that executes complex event queries over real-time streams of RFID readings encoded as events. These complex event queries filter and correlate events to match specific patterns, and transform the relevant events into new composite events for the use of external monitoring applications. Stream-based execution of these queries enables time-critical actions to be taken in environments such as supply chain management, surveillance and facility management, healthcare, etc. We first propose a complex event language that significantly extends existing event languages to meet the needs of a range of RFID-enabled monitoring applications. We then describe a query plan-based approach to efficiently implementing this language. Our approach uses native operators to efficiently handle query-defined sequences, which are a key component of complex event processing, and pipeline such sequences to subsequent operators that are built by leveraging relational techniques. We also develop a large suite of optimization techniques to address challenges such as large sliding windows and intermediate result sizes. We demonstrate the effectiveness of our approach through a detailed performance analysis of our prototype implementation under a range of data and query workloads as well as through a comparison to a state-of-the-art stream processor.

902 citations

Proceedings ArticleDOI
03 Jun 2002
TL;DR: This work presents a continuously adaptive, continuous query (CACQ) implementation based on the eddy query processing framework that provides significant performance benefits over existing approaches to evaluating continuous queries, not only because of its adaptivity, but also because of the aggressive cross-query sharing of work and space that it enables.
Abstract: We present a continuously adaptive, continuous query (CACQ) implementation based on the eddy query processing framework We show that our design provides significant performance benefits over existing approaches to evaluating continuous queries, not only because of its adaptivity, but also because of the aggressive cross-query sharing of work and space that it enables By breaking the abstraction of shared relational algebra expressions, our Telegraph CACQ implementation is able to share physical operators --- both selections and join state --- at a very fine grain We augment these features with a grouped-filter index to simultaneously evaluate multiple selection predicates We include measurements of the performance of our core system, along with a comparison to existing continuous query approaches

711 citations


"Scalable Pattern Sharing on Event S..." refers background or methods in this paper

  • ...In [28], an adaptive tuple-level sharing technique was proposed to share multiple query processing over data streams with fluctuations....

    [...]

  • ...There has also been some work on MQO for continuous queries over data streams [12, 28]....

    [...]

  • ..., tuple, pane, or full window) to share a sub-expression across queries [28]....

    [...]

Proceedings ArticleDOI
09 Jun 2008
TL;DR: This paper presents a formal evaluation model that offers precise semantics for this new class of queries and a query evaluation framework permitting optimizations in a principled way and further analyzes the runtime complexity of query evaluation using this model and develops a suite of techniques that improve runtime efficiency by exploiting sharing in storage and processing.
Abstract: Pattern matching over event streams is increasingly being employed in many areas including financial services, RFIDbased inventory management, click stream analysis, and electronic health systems. While regular expression matching is well studied, pattern matching over streams presents two new challenges: Languages for pattern matching over streams are significantly richer than languages for regular expression matching. Furthermore, efficient evaluation of these pattern queries over streams requires new algorithms and optimizations: the conventional wisdom for stream query processing (i.e., using selection-join-aggregation) is inadequate.In this paper, we present a formal evaluation model that offers precise semantics for this new class of queries and a query evaluation framework permitting optimizations in a principled way. We further analyze the runtime complexity of query evaluation using this model and develop a suite of techniques that improve runtime efficiency by exploiting sharing in storage and processing. Our experimental results provide insights into the various factors on runtime performance and demonstrate the significant performance gains of our sharing techniques.

441 citations


"Scalable Pattern Sharing on Event S..." refers background in this paper

  • ...SASE [3, 42] and Cayuga [6] follow an NFA-based approach for detecting sequence patterns where the arrival of events into the system are marked by state transitions....

    [...]

  • ...Using the SASE approach each pattern in the workload maintains its own event data-structures....

    [...]

  • ...to contain matches valid for the interval [3,5]....

    [...]

  • ...Many CEP systems [6, 3, 42] have been proposed to process sequence pattern queries over event streams....

    [...]

  • ...We now evaluate the efficiency and scalability of the SPASS Runtime compared to the most prominent approach, i.e., SASE [42]....

    [...]