scispace - formally typeset
Search or ask a question
Proceedings Article

On Load Shedding in Complex Event Processing

TL;DR: This paper formalizes broad classes of CEP load-shedding scenarios as different optimization problems and demonstrates an array of complexity results that reveal the hardness of these problems and construct shedding algorithms with performance guarantees.
Abstract: Complex Event Processing (CEP) is a stream processing model that focuses on detecting event patterns in continuous event streams. While the CEP model has gained popularity in the research communities and commercial technologies, the problem of gracefully degrading performance under heavy load in the presence of resource constraints, or load shedding, has been largely overlooked. CEP is similar to “classical” stream data management, but addresses a substantially different class of queries. This unfortunately renders the load shedding algorithms developed for stream data processing inapplicable. In this paper we study CEP load shedding under various resource constraints. We formalize broad classes of CEP load-shedding scenarios as different optimization problems. We demonstrate an array of complexity results that reveal the hardness of these problems and construct shedding algorithms with performance guarantees. Our results shed some light on the difficulty of developing load-shedding algorithms that maximize utility.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The main techniques and state-of-the-art research efforts in IoT from data-centric perspectives are reviewed, including data stream processing, data storage models, complex event processing, and searching in IoT.

289 citations

Posted Content
TL;DR: The main techniques and state-of-the-art research efforts in IoT from data-centric perspectives are surveyed, including data stream processing, data storage models, complex event processing, and searching in IoT.
Abstract: With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed.

43 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...For example, Heinze et al. [2013] study complex event processing in a distributed environment and propose FUGU – an elastic allocator for Complex Event Processing systems....

    [...]

  • ...Very recently, He et al. [2014] investigate load shedding techniques for complex event processing under various resource constraints....

    [...]

Proceedings ArticleDOI
13 Jun 2016
TL;DR: This paper provides a theoretical analysis proving that LAS is an (ε, δ)-approximation of the optimal online load shedder and shows its performance through a practical evaluation based both on simulations and on a running prototype.
Abstract: Load shedding is a technique employed by stream processing systems to handle unpredictable spikes in the input load whenever available computing resources are not adequately provisioned. A load shedder drops tuples to keep the input load below a critical threshold and thus avoid tuple queuing and system trashing. In this paper we propose Load-Aware Shedding (LAS), a novel load shedding solution that drops tuples with the aim of maintaining queuing times below a tunable threshold. Tuple execution durations are estimated at runtime using efficient sketch data structures. We provide a theoretical analysis proving that LAS is an (e, δ)-approximation of the optimal online load shedder and show its performance through a practical evaluation based both on simulations and on a running prototype.

31 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...in [5] specialized the problem to the case of complex event processing....

    [...]

Journal ArticleDOI
01 Jan 2020
TL;DR: This paper reviews core components that enable large-scale querying and indexing for microblogs data, and discusses system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems.
Abstract: Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of interests including targeted advertising, market reports, news delivery, political campaigns, rescue services, and public health. Consequently, major research efforts have been spent to manage, analyze, and visualize microblogs to support different applications. This paper gives a comprehensive review of major research and system work in microblogs data management. The paper reviews core components that enable large-scale querying and indexing for microblogs data. A dedicated part gives particular focus for discussing system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems. In addition, we review the major research topics that exploit these core data management components to provide innovative and effective analysis and visualization for microblogs, such as event detection, recommendations, automatic geotagging, and user queries. Throughout the different parts, we highlight the challenges, innovations, and future opportunities in microblogs data research.

23 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...ment in database systems [97], anti-caching inmain-memory databases [85,197,374], and load shedding in data stream management systems [33,112,138], flushing in microblogs...

    [...]

Proceedings ArticleDOI
20 Apr 2020
TL;DR: This work introduces a hybrid model that combines both input-based and statebased shedding to achieve high result quality under constrained resources and indicates that such hybrid shedding improves the recall by up to 14× for synthetic data and 11.4× for real-world data, compared to baseline approaches.
Abstract: Complex event processing (CEP) systems that evaluate queries over streams of events may face unpredictable input rates and query selectivities. During short peak times, exhaustive processing is then no longer reasonable, or even infeasible, and systems shall resort to best-effort query evaluation and strive for optimal result quality while staying within a latency bound. In traditional data stream processing, this is achieved by load shedding that discards some stream elements without processing them based on their estimated utility for the query result.We argue that such input-based load shedding is not always suitable for CEP queries. It assumes that the utility of each individual element of a stream can be assessed in isolation. For CEP queries, however, this utility may be highly dynamic: Depending on the presence of partial matches, the impact of discarding a single event can vary drastically. In this work, we therefore complement input-based load shedding with a statebased technique that discards partial matches. We introduce a hybrid model that combines both input-based and statebased shedding to achieve high result quality under constrained resources. Our experiments indicate that such hybrid shedding improves the recall by up to 14× for synthetic data and 11.4× for real-world data, compared to baseline approaches.

20 citations


Cites background or methods from "On Load Shedding in Complex Event P..."

  • ...The characteristics and the complexity of load shedding for CEP has been discussed in [24]....

    [...]

  • ...This is infeasible for CEP [24], due to the high volatility of query selectivity and, therefore, processing rates of a system....

    [...]

  • ...The aforementioned techniques are not applicable for CEP, though [24], as we discuss based on the questions of when to shed (Q1); what to shed (Q2); and how much to shed (Q3)....

    [...]

  • ...Against this background, CEP systems shall employ besteffort processing, when resource demands peak [24]....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2006
TL;DR: This paper proposes a complex event language that significantly extends existing event languages to meet the needs of a range of RFID-enabled monitoring applications and describes a query plan-based approach to efficiently implementing this language.
Abstract: In this paper, we present the design, implementation, and evaluation of a system that executes complex event queries over real-time streams of RFID readings encoded as events. These complex event queries filter and correlate events to match specific patterns, and transform the relevant events into new composite events for the use of external monitoring applications. Stream-based execution of these queries enables time-critical actions to be taken in environments such as supply chain management, surveillance and facility management, healthcare, etc. We first propose a complex event language that significantly extends existing event languages to meet the needs of a range of RFID-enabled monitoring applications. We then describe a query plan-based approach to efficiently implementing this language. Our approach uses native operators to efficiently handle query-defined sequences, which are a key component of complex event processing, and pipeline such sequences to subsequent operators that are built by leveraging relational techniques. We also develop a large suite of optimization techniques to address challenges such as large sliding windows and intermediate result sizes. We demonstrate the effectiveness of our approach through a detailed performance analysis of our prototype implementation under a range of data and query workloads as well as through a comparison to a state-of-the-art stream processor.

902 citations


"On Load Shedding in Complex Event P..." refers background or methods in this paper

  • ...Following the standard practice of the CEP literature [6, 7, 36, 51], we assume events are temporally ordered by their timestamps....

    [...]

  • ...The complex event processing or CEP model has received signi ficant attention from the research community [6, 28, 35, 36, 49, 51], and has been adopted by a number of commercial systems including Microsoft StreamInsight [1], Sybase Aleri [3], and StreamBase [2]....

    [...]

  • ...The CEP model has been proposed and developed by a number of se minal papers (see [6] and [51] as examples)....

    [...]

  • ...While the emergence of CEP model has spawned a wide variety of applications, so far research efforts have focused almost exclusively on improving CEP query join efficiency [6, 28, 35, 36, 49, 51]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the maximum of a square-free quadratic form on a simplex was investigated and the following question was suggested by a problem of J. E. MacDonald Jr.
Abstract: Maximum of a square-free quadratic form on a simplex. The following question was suggested by a problem of J. E. MacDonald Jr. (1): Given a graph G with vertices 1, 2, . . . , n. Let S be the simplex in En given by xi ≥ 0, Σxi = 1. What is

790 citations

Book ChapterDOI
09 Sep 2003
TL;DR: This paper examines a technique for dynamically inserting and removing drop operators into query plans as required by the current load, and addresses the problems of determining when load shedding is needed, where in the query plan to insert drops, and how much of the load should be shed at that point in the plan.
Abstract: A Data Stream Manager accepts push-based inputs from a set of data sources, processes these inputs with respect to a set of standing queries, and produces outputs based on Quality-of-Service (QoS) specifications. When input rates exceed system capacity, the system will become overloaded and latency will deteriorate. Under these conditions, the system will shed load, thus degrading the answer, in order to improve the observed latency of the results. This paper examines a technique for dynamically inserting and removing drop operators into query plans as required by the current load. We examine two types of drops: the first drops a fraction of the tuples in a randomized fashion, and the second drops tuples based on the importance of their content. We address the problems of determining when load shedding is needed, where in the query plan to insert drops, and how much of the load should be shed at that point in the plan. We describe efficient solutions and present experimental evidence that they can bring the system back into the useful operating range with minimal degradation in answer quality.

662 citations


"On Load Shedding in Complex Event P..." refers background in this paper

  • ...While load shedding has been extensively studied in the context of general stream processing [10, 20, 25, 26, 43, 44, 48, 49, 53], the focus there is aggregate queries or two-relation equi-join queries, which are important for traditional stream joins....

    [...]

  • ..., [9, 10, 20, 26, 33, 43, 44, 48, 49]) has been devoted to this problem....

    [...]

  • ...The work [44] studies the similar problem of strategically placing drop operator in the operator tree to optimize utility as defined by QoS graphs....

    [...]

  • ...Load shedding, an important issue that has been extensively studied in traditional stream processing [10, 20, 25, 26, 43, 44, 48, 49, 53], has been largely overlooked in the new context of CEP....

    [...]

Proceedings Article
01 Jan 2003
TL;DR: The architectural challenges facing the design of large-scale distributed stream processing systems are described, and novel approaches for addressing load management, high availability, and federated operation issues are discussed.
Abstract: Stream processing fits a large class of new applications for which conventional DBMSs fall short. Because many stream-oriented systems are inherently geographically distributed and because distribution offers scalable load management and higher availability, future stream processing systems will operate in a distributed fashion. They will run across the Internet on computers typically owned by multiple cooperating administrative domains. This paper describes the architectural challenges facing the design of large-scale distributed stream processing systems, and discusses novel approaches for addressing load management, high availability, and federated operation issues. We describe two stream processing systems, Aurora* and Medusa, which are being designed to explore complementary solutions to these challenges. This paper discusses the architectural issues facing the design of large-scale distributed stream processing systems. We begin in Section 2 with a brief description of our centralized stream processing system, Aurora [4]. We then discuss two complementary efforts to extend Aurora to a distributed environment: Aurora* and Medusa. Aurora* assumes an environment in which all nodes fall under a single administrative domain. Medusa provides the infrastructure to support federated operation of nodes across administrative boundaries. After describing the architectures of these two systems in Section 3, we consider three design challenges common to both: infrastructures and protocols supporting communication amongst nodes (Section 4), load sharing in response to variable network conditions (Section 5), and high availability in the presence of failures (Section 6). We also discuss high-level policy specifications employed by the two systems in Section 7. For all of these issues, we believe that the push-based nature of stream-based applications not only raises new challenges but also offers the possibility of new domain-specific solutions.

624 citations


"On Load Shedding in Complex Event P..." refers background in this paper

  • ...This is typically obtained by sampling the arriving stream [17, 41]....

    [...]

Proceedings ArticleDOI
09 Jun 2008
TL;DR: This paper presents a formal evaluation model that offers precise semantics for this new class of queries and a query evaluation framework permitting optimizations in a principled way and further analyzes the runtime complexity of query evaluation using this model and develops a suite of techniques that improve runtime efficiency by exploiting sharing in storage and processing.
Abstract: Pattern matching over event streams is increasingly being employed in many areas including financial services, RFIDbased inventory management, click stream analysis, and electronic health systems. While regular expression matching is well studied, pattern matching over streams presents two new challenges: Languages for pattern matching over streams are significantly richer than languages for regular expression matching. Furthermore, efficient evaluation of these pattern queries over streams requires new algorithms and optimizations: the conventional wisdom for stream query processing (i.e., using selection-join-aggregation) is inadequate.In this paper, we present a formal evaluation model that offers precise semantics for this new class of queries and a query evaluation framework permitting optimizations in a principled way. We further analyze the runtime complexity of query evaluation using this model and develop a suite of techniques that improve runtime efficiency by exploiting sharing in storage and processing. Our experimental results provide insights into the various factors on runtime performance and demonstrate the significant performance gains of our sharing techniques.

441 citations


"On Load Shedding in Complex Event P..." refers background or methods in this paper

  • ...Following the standard practice of the CEP literature [6, 7, 37, 52], we assume events are temporally ordered by their timestamps....

    [...]

  • ...We note that there are three additional join semantics defined in [6], namely, skip-till-next-match, partition-contiguity and contiguity....

    [...]

  • ...Such queries are widely studied [2, 6, 28, 36, 37, 50, 52] and used in CEP systems....

    [...]

  • ...The CEP model has been proposed and developed by a number of seminal papers (see [6] and [52] as examples)....

    [...]

  • ...The complex event processing or CEP model has received significant attention from the research community [6, 28, 36, 37, 50, 52], and has been adopted by a number of commercial systems including Microsoft StreamInsight [1], Sybase Aleri [3], and StreamBase [2]....

    [...]