scispace - formally typeset
Search or ask a question
Proceedings Article

On Load Shedding in Complex Event Processing

TL;DR: This paper formalizes broad classes of CEP load-shedding scenarios as different optimization problems and demonstrates an array of complexity results that reveal the hardness of these problems and construct shedding algorithms with performance guarantees.
Abstract: Complex Event Processing (CEP) is a stream processing model that focuses on detecting event patterns in continuous event streams. While the CEP model has gained popularity in the research communities and commercial technologies, the problem of gracefully degrading performance under heavy load in the presence of resource constraints, or load shedding, has been largely overlooked. CEP is similar to “classical” stream data management, but addresses a substantially different class of queries. This unfortunately renders the load shedding algorithms developed for stream data processing inapplicable. In this paper we study CEP load shedding under various resource constraints. We formalize broad classes of CEP load-shedding scenarios as different optimization problems. We demonstrate an array of complexity results that reveal the hardness of these problems and construct shedding algorithms with performance guarantees. Our results shed some light on the difficulty of developing load-shedding algorithms that maximize utility.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The main techniques and state-of-the-art research efforts in IoT from data-centric perspectives are reviewed, including data stream processing, data storage models, complex event processing, and searching in IoT.

289 citations

Posted Content
TL;DR: The main techniques and state-of-the-art research efforts in IoT from data-centric perspectives are surveyed, including data stream processing, data storage models, complex event processing, and searching in IoT.
Abstract: With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed.

43 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...For example, Heinze et al. [2013] study complex event processing in a distributed environment and propose FUGU – an elastic allocator for Complex Event Processing systems....

    [...]

  • ...Very recently, He et al. [2014] investigate load shedding techniques for complex event processing under various resource constraints....

    [...]

Proceedings ArticleDOI
13 Jun 2016
TL;DR: This paper provides a theoretical analysis proving that LAS is an (ε, δ)-approximation of the optimal online load shedder and shows its performance through a practical evaluation based both on simulations and on a running prototype.
Abstract: Load shedding is a technique employed by stream processing systems to handle unpredictable spikes in the input load whenever available computing resources are not adequately provisioned. A load shedder drops tuples to keep the input load below a critical threshold and thus avoid tuple queuing and system trashing. In this paper we propose Load-Aware Shedding (LAS), a novel load shedding solution that drops tuples with the aim of maintaining queuing times below a tunable threshold. Tuple execution durations are estimated at runtime using efficient sketch data structures. We provide a theoretical analysis proving that LAS is an (e, δ)-approximation of the optimal online load shedder and show its performance through a practical evaluation based both on simulations and on a running prototype.

31 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...in [5] specialized the problem to the case of complex event processing....

    [...]

Journal ArticleDOI
01 Jan 2020
TL;DR: This paper reviews core components that enable large-scale querying and indexing for microblogs data, and discusses system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems.
Abstract: Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of interests including targeted advertising, market reports, news delivery, political campaigns, rescue services, and public health. Consequently, major research efforts have been spent to manage, analyze, and visualize microblogs to support different applications. This paper gives a comprehensive review of major research and system work in microblogs data management. The paper reviews core components that enable large-scale querying and indexing for microblogs data. A dedicated part gives particular focus for discussing system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems. In addition, we review the major research topics that exploit these core data management components to provide innovative and effective analysis and visualization for microblogs, such as event detection, recommendations, automatic geotagging, and user queries. Throughout the different parts, we highlight the challenges, innovations, and future opportunities in microblogs data research.

23 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...ment in database systems [97], anti-caching inmain-memory databases [85,197,374], and load shedding in data stream management systems [33,112,138], flushing in microblogs...

    [...]

Proceedings ArticleDOI
20 Apr 2020
TL;DR: This work introduces a hybrid model that combines both input-based and statebased shedding to achieve high result quality under constrained resources and indicates that such hybrid shedding improves the recall by up to 14× for synthetic data and 11.4× for real-world data, compared to baseline approaches.
Abstract: Complex event processing (CEP) systems that evaluate queries over streams of events may face unpredictable input rates and query selectivities. During short peak times, exhaustive processing is then no longer reasonable, or even infeasible, and systems shall resort to best-effort query evaluation and strive for optimal result quality while staying within a latency bound. In traditional data stream processing, this is achieved by load shedding that discards some stream elements without processing them based on their estimated utility for the query result.We argue that such input-based load shedding is not always suitable for CEP queries. It assumes that the utility of each individual element of a stream can be assessed in isolation. For CEP queries, however, this utility may be highly dynamic: Depending on the presence of partial matches, the impact of discarding a single event can vary drastically. In this work, we therefore complement input-based load shedding with a statebased technique that discards partial matches. We introduce a hybrid model that combines both input-based and statebased shedding to achieve high result quality under constrained resources. Our experiments indicate that such hybrid shedding improves the recall by up to 14× for synthetic data and 11.4× for real-world data, compared to baseline approaches.

20 citations


Cites background or methods from "On Load Shedding in Complex Event P..."

  • ...The characteristics and the complexity of load shedding for CEP has been discussed in [24]....

    [...]

  • ...This is infeasible for CEP [24], due to the high volatility of query selectivity and, therefore, processing rates of a system....

    [...]

  • ...The aforementioned techniques are not applicable for CEP, though [24], as we discuss based on the questions of when to shed (Q1); what to shed (Q2); and how much to shed (Q3)....

    [...]

  • ...Against this background, CEP systems shall employ besteffort processing, when resource demands peak [24]....

    [...]

References
More filters
Journal ArticleDOI
01 Jun 2011
TL;DR: iCBS takes the query costs derived from the service level agreements between the service provider and its customers into account to make cost-aware scheduling decisions, and reduces the online time complexity from O(N) for the original version CBS to O(log2 N) for iCBS.
Abstract: In a cloud computing environment, it is beneficial for the cloud service provider to offer differentiated services among different customers, who often have different cost profiles. Therefore, cost-aware scheduling of queries is important. A practical cost-aware scheduling algorithm must be able to handle the highly demanding query volumes in the scheduling queues to make online scheduling decisions very quickly. We develop such a highly efficient cost-aware query scheduling algorithm, called iCBS. iCBS takes the query costs derived from the service level agreements (SLAs) between the service provider and its customers into account to make cost-aware scheduling decisions. iCBS is an incremental variation of an existing scheduling algorithm, CBS. Although CBS exhibits an exceptionally good cost performance, it has a prohibitive time complexity. Our main contributions are (1) to observe how CBS behaves under piecewise linear SLAs, which are very common in cloud computing systems, and (2) to efficiently leverage these observations and to reduce the online time complexity from O(N) for the original version CBS to O(log2N) for iCBS.

61 citations


"On Load Shedding in Complex Event P..." refers background in this paper

  • ...They have a financial incentive to judiciously shed work from queries that are associated with a low penalty cost as specified in Service Level Agreements (SLAs), so that their profits can be maximized (similar problems have been called “profit maximization in a cloud” and have been considered in the Database-as-a-Service literature [18, 19])....

    [...]

01 Jan 2003
TL;DR: Probabilistic arguments for justifying the quality of an approximate solution for global quadratic minimization problem are developed, obtained as a best point among all points of a uniform grid inside a polyhedral feasible set and some related problems are shown to be NP-hard.
Abstract: In this paper we develop probabilistic arguments for justifying the quality of an approximate solution for global quadratic minimization problem, obtained as a best point among all points of a uniform grid inside a polyhedral feasible set. Our main tool is a random walk inside the standard simplex, for which it is easy to find explicit probabilistic characteristics. For any integer k = 1 we can generate an approximate solution with relative accuracy 1k provided that the quadratic objective function is non-negative in all nodes of the feasible set. The complexity of the process is polynomial in the number of nodes and in the dimension of the space of variables. We extend some of the results to problems with polynomial objective function. We conclude the paper by showing that some related problems (maximization of cubic or quartic form over the Euclidean ball, and the matrix ellipsoid problem) are NP-hard.

60 citations

Proceedings ArticleDOI
21 Mar 2011
TL;DR: This paper proposes a novel data structure, called SLA-tree, to efficiently support profit-oriented decision making in cloud computing, and efficiently support the answering of certain profit- oriented "what if" questions.
Abstract: As cloud computing becomes increasingly important in database systems, many new challenges and opportunities have arisen. One challenge is that in cloud computing, business profit plays a central role. Hence, it is very important for a cloud service provider to quickly make profit-oriented decisions. In this paper, we propose a novel data structure, called SLA-tree, to efficiently support profit-oriented decision making. SLA-tree is built on two pieces of information: (1) a set of buffered queries waiting to be executed, which represents the scheduled events that will happen in the near future, and (2) a service level agreement (SLA) for each query, which indicates the different profits for the query for varying query response times. By constructing the SLA-tree, we efficiently support the answering of certain profit-oriented "what if" questions. Answers to these questions in turn can be applied to different profit-oriented decisions in cloud computing such as profit-aware scheduling, dispatching, and capacity planning. Extensive experimental results based on both synthetic and real-world data demonstrate the effectiveness and efficiency of our SLA-tree framework.

58 citations

Proceedings ArticleDOI
11 Jun 2007
TL;DR: A formal framework is created and it is shown that there is a unique model up to isomorphism that satisfies the standard axioms and supports associativity, so this model is ideally suited to be the standard temporal model for complex event processing.
Abstract: Event processing systems have wide applications ranging from managing events from RFID readers to monitoring RSS feeds. Consequently, there exists much work on them in the literature. The prevalent use of these systems is on-line recognition of patterns that are sequences of correlated events in event streams. Query semantics and implementation efficiency are inherently determined by the underlying temporal model: how events are sequenced (what is the "next" event), and how the time stamp of an event is represented. Many competing temporal models for event systems have been proposed, with no consensus on which approach is best.We take a foundational approach to this problem. We create a formal framework and present event system design choices as axioms. The axioms are grouped into standard axioms and desirable axioms. Standard axioms are common to the design of all event systems. Desirable axioms are not always satisfied, but are useful for achieving high performance. Given these axioms, we prove several important results. First, we show that there is a unique model up to isomorphism that satisfies the standard axioms and supports associativity, so our axioms are a sound and complete axiomatization of associative time stamps in eventsystems. This model requires time stamps with unbounded representations. We present a slightly weakened version of associativity that permits a temporal model with bounded representations. We show that adding the boundedness condition also results in a unique model, so again our axiomatization is sound and complete. We believe this model is ideally suited to be the standard temporal model for complex event processing.

55 citations


"On Load Shedding in Complex Event P..." refers background in this paper

  • ...In this case “shedding” all events of type B and D will sacrifice the results ofQ3 but preserves A, C and E and meets the memory constraint....

    [...]

  • ...D B ] 16 D ec 2 01 3 Complex Event Processing (CEP) is a stream processing modelthat focuses on detecting event patterns in continuous event streams....

    [...]

Proceedings ArticleDOI
11 Apr 2011
TL;DR: A combined stream processing system that adaptively balances workload between a dedicated local stream processor and a cloud stream processor, and can adapt effectively to workload variations, while only discarding a small percentage of input data is presented.
Abstract: Stream processing systems must handle stream data coming from real-time, high-throughput applications, for example in financial trading. Timely processing of streams is important and requires sufficient available resources to achieve high throughput and deliver accurate results. However, static allocation of stream processing resources in terms of machines is inefficient when input streams have significant rate variations—machines remain underutilised for long periods of average load. We present a combined stream processing system that, as the input stream rate varies, adaptively balances workload between a dedicated local stream processor and a cloud stream processor. This approach only utilises cloud machines when the local stream processor becomes overloaded. We evaluate a prototype system with financial trading data. Our results show that it can adapt effectively to workload variations, while only discarding a small percentage of input data.

43 citations


"On Load Shedding in Complex Event P..." refers background in this paper

  • ...Specifically, stream processing applications are gradually shif ting to the cloud [33]....

    [...]