scispace - formally typeset

Proceedings Article

On Load Shedding in Complex Event Processing

01 Mar 2014-pp 213-224

TL;DR: This paper formalizes broad classes of CEP load-shedding scenarios as different optimization problems and demonstrates an array of complexity results that reveal the hardness of these problems and construct shedding algorithms with performance guarantees.

AbstractComplex Event Processing (CEP) is a stream processing model that focuses on detecting event patterns in continuous event streams. While the CEP model has gained popularity in the research communities and commercial technologies, the problem of gracefully degrading performance under heavy load in the presence of resource constraints, or load shedding, has been largely overlooked. CEP is similar to “classical” stream data management, but addresses a substantially different class of queries. This unfortunately renders the load shedding algorithms developed for stream data processing inapplicable. In this paper we study CEP load shedding under various resource constraints. We formalize broad classes of CEP load-shedding scenarios as different optimization problems. We demonstrate an array of complexity results that reveal the hardness of these problems and construct shedding algorithms with performance guarantees. Our results shed some light on the difficulty of developing load-shedding algorithms that maximize utility.

Topics: Complex event processing (60%)

...read more

Content maybe subject to copyright    Report

Citations
More filters

Journal ArticleDOI
TL;DR: The main techniques and state-of-the-art research efforts in IoT from data-centric perspectives are reviewed, including data stream processing, data storage models, complex event processing, and searching in IoT.
Abstract: With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, but several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy and continuous. This paper reviews the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed.

243 citations


Posted Content
TL;DR: The main techniques and state-of-the-art research efforts in IoT from data-centric perspectives are surveyed, including data stream processing, data storage models, complex event processing, and searching in IoT.
Abstract: With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed.

41 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...For example, Heinze et al. [2013] study complex event processing in a distributed environment and propose FUGU – an elastic allocator for Complex Event Processing systems....

    [...]

  • ...Very recently, He et al. [2014] investigate load shedding techniques for complex event processing under various resource constraints....

    [...]


Proceedings ArticleDOI
13 Jun 2016
TL;DR: This paper provides a theoretical analysis proving that LAS is an (ε, δ)-approximation of the optimal online load shedder and shows its performance through a practical evaluation based both on simulations and on a running prototype.
Abstract: Load shedding is a technique employed by stream processing systems to handle unpredictable spikes in the input load whenever available computing resources are not adequately provisioned. A load shedder drops tuples to keep the input load below a critical threshold and thus avoid tuple queuing and system trashing. In this paper we propose Load-Aware Shedding (LAS), a novel load shedding solution that drops tuples with the aim of maintaining queuing times below a tunable threshold. Tuple execution durations are estimated at runtime using efficient sketch data structures. We provide a theoretical analysis proving that LAS is an (e, δ)-approximation of the optimal online load shedder and show its performance through a practical evaluation based both on simulations and on a running prototype.

28 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...in [5] specialized the problem to the case of complex event processing....

    [...]


Journal ArticleDOI
01 Jan 2020
TL;DR: This paper reviews core components that enable large-scale querying and indexing for microblogs data, and discusses system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems.
Abstract: Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of interests including targeted advertising, market reports, news delivery, political campaigns, rescue services, and public health. Consequently, major research efforts have been spent to manage, analyze, and visualize microblogs to support different applications. This paper gives a comprehensive review of major research and system work in microblogs data management. The paper reviews core components that enable large-scale querying and indexing for microblogs data. A dedicated part gives particular focus for discussing system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems. In addition, we review the major research topics that exploit these core data management components to provide innovative and effective analysis and visualization for microblogs, such as event detection, recommendations, automatic geotagging, and user queries. Throughout the different parts, we highlight the challenges, innovations, and future opportunities in microblogs data research.

18 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...ment in database systems [97], anti-caching inmain-memory databases [85,197,374], and load shedding in data stream management systems [33,112,138], flushing in microblogs...

    [...]


Proceedings ArticleDOI
16 May 2016
Abstract: Searching microblogs, e.g., tweets and comments, is practically supported through main-memory indexing for scalable data digestion and efficient query evaluation. With continuity and excessive numbers of microblogs, it is infeasible to keep data in main-memory for long periods. Thus, once allocated memory budget is filled, a portion of data is flushed from memory to disk to continuously accommodate newly incoming data. Existing techniques come with either low memory hit ratio due to flushing items regardless of their relevance to incoming queries or significant overhead of tracking individual data items, which limit scalability of microblogs systems in either cases. In this paper, we propose kFlushing policy that exploits popularity of top-k queries in microblogs to smartly select a subset of microblogs to flush. kFlushing is mainly designed to increase memory hit ratio. To this end, it identifies and flushes in-memory data that does not contribute to incoming queries. The freed memory space is utilized to accumulate more useful data that is used to answer more queries from memory contents. When all memory is utilized for useful data, kFlushing flushes data that is less likely to degrade memory hit ratio. In addition, kFlushing comes with a little overhead that keeps high system scalability in terms of high digestion rates of incoming fast data. Extensive experimental evaluation shows the effectiveness and scalability of kFlushing to improve main-memory hit by 26–330% while coping up with fast microblog streams of up to 100K microblog/second.

11 citations


Cites background from "On Load Shedding in Complex Event P..."

  • ...1. kFlushing main idea different terminologies, e.g.,buffer managementin database management systems (DBMSs) [9],anti-caching in mainmemory databases [8, 15, 30], andload sheddingin data stream management systems (DSMSs) [1, 12, 13]....

    [...]


References
More filters

Book
01 Mar 2004
Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

33,299 citations


Book
02 Jul 2001
TL;DR: Covering the basic techniques used in the latest research work, the author consolidates progress made so far, including some very recent and promising results, and conveys the beauty and excitement of work in the field.
Abstract: Covering the basic techniques used in the latest research work, the author consolidates progress made so far, including some very recent and promising results, and conveys the beauty and excitement of work in the field. He gives clear, lucid explanations of key results and ideas, with intuitive proofs, and provides critical examples and numerous illustrations to help elucidate the algorithms. Many of the results presented have been simplified and new insights provided. Of interest to theoretical computer scientists, operations researchers, and discrete mathematicians.

4,127 citations


Additional excerpts

  • ...[46] The integral CPU-bound load shedding prob-...

    [...]


Book
01 Jan 1998
TL;DR: This book discusses competitive analysis and decision making under uncertainty in the context of the k-server problem, which involves randomized algorithms in order to solve the problem of paging.
Abstract: Preface 1. Introduction to competitive analysis: the list accessing problem 2. Introduction to randomized algorithms: the list accessing problem 3. Paging: deterministic algorithms 4. Paging: randomized algorithms 5. Alternative models for paging: beyond pure competitive analysis 6. Game theoretic foundations 7. Request - answer games 8. Competitive analysis and zero-sum games 9. Metrical task systems 10. The k-server problem 11. Randomized k-server algorithms 12. Load-balancing 13. Call admission and circuit-routing 14. Search, trading and portfolio selection 15. Competitive analysis and decision making under uncertainty Appendices Bibliography Index.

2,525 citations


"On Load Shedding in Complex Event P..." refers background in this paper

  • ...Performance of online algorithms is oftentimes measured against their offline counterparts to develop quality guarantees like competitive ratios [12]....

    [...]


Proceedings ArticleDOI
30 Sep 1977
TL;DR: Two approaches to the study of expected running time of algoritruns lead naturally to two different definitions of intrinsic complexity of a problem, which are the distributional complexity and the randomized complexity, respectively.
Abstract: 1. Introduction The study of expected running time of algoritruns is an interesting subject from both a theoretical and a practical point of view. Basically there exist two approaches to this study. In the first approach (we shall call it the distributional approach), some "natural" distribution is assumed for the input of a problem, and one looks for fast algorithms under this assumption (see Knuth [8J). For example, in sorting n numbers, it is usually assumed that all n! initial orderings of the numbers are equally likely. A common criticism of this approach is that distributions vary a great deal in real life situations; fu.rthermore, very often the true distribution of the input is simply not known. An alternative approach which attempts to overcome this shortcoming by allowing stochastic moves in the computation has recently been proposed. This is the randomized approach made popular by Habin [lOJ(also see Gill[3J, Solovay and Strassen [13J), although the concept was familiar to statisticians (for exa'1lple, see Luce and Raiffa [9J). Note that by allowing stochastic moves in an algorithm, the input is effectively being randomized. We shall refer to such an algoritlvn as a randomized algorithm. These two approaches lead naturally to two different definitions of intrinsic complexity of a problem, which we term the distributional complexity and the randomized complexity, respectively. (Precise definitions and examples will be given in Sections 2 and 3.) To solidify the ideas, we look at familiar combinatorial problems that can be modeled by decision trees. In particular, we consider (a) the testing of an arbitrary graph property from an adjacency matrix (Section 2), and (b) partial order problems on n We will show that for these two classes of problems, the two complexity measures always agree by virtue of a famous theorem, the Minimax Theorem of Von Neumann [14J. The connection between the two approaches lends itself to applications. With two different views (and in a sense complementary to each other) on the complexity of a problem, it is frequently easier to derive upper and lower bounds. For example, using adjacency matrix representation for a graph, it can be shown that no randomized algorithm can determine 2 the existence of a perfect matching in less than O(n) probes. Such lower bounds to the randomized approach were lacking previously. As another example of application , we can prove that for the partial order problems in (b), assuming uniform …

1,097 citations


Book
27 Apr 2004
TL;DR: This book discusses Real-Time Scheduling Problems, Scheduling Models, Stochastic Scheduling, and Online Deterministic Scheduling as well as some basic Scheduling Algorithms and Complexity.
Abstract: Introduction Introduction and Notation, Joseph Y-T. Leung A Tutorial on Complexity, Joseph Y-T. Leung Some Basic Scheduling Algorithms, Joseph Y-T. Leung Classical Scheduling Problems Elimination Rules for Job-shop Scheduling Problem: Overview and Extensions, Jacques Carlier, Laurent Peridy, Eric Pinson, and David Rivreau Flexible Hybrid Flowshops, George Vairaktarakis Open Shop Scheduling, Teofilo F. Gonzalez Cycle Shop Scheduling, Vadim G. Timkovsky Reducibility among Scheduling Classes, Vadim G. Timkovsky Parallel Scheduling for Early Completion, Bo Chen Minimizing the Maximum Lateness, Hans Kellerer Approximation Algorithms for Minimizing Average Weighted Completion Time, Chandra Chekuri and Sanjeev Khanna Minimizing the Number of Tardy Jobs, Marjan van den Akker and Han Hoogeveen Branch-and-Bound Algorithms for Total Weighted Tardiness, Antoino Jouglet, Philippe Baptiste, and Jacques Carlier Scheduling Equal Processing Time Jobs, Philippe Baptiste and Peter Brucker Online Scheduling, Kirk Pruhs, Jiri Sgall, and Eric Torng Convex Quadratic Relaxations in Scheduling, Jay Sethuraman Other Scheduling Models The Master/Slave Scheduling Model, Sartaj Sahni and George Vairaktarakis Scheduling in Bluetooth Networks, Yong Man Kim and Ten H. Lai Fair Sequences, Wieslaw Kubiak Due-Date Quotation Models and Algorithms, Philip Kaminsky and Dorit Hochbaum Scheduling with Due-Date Assignment, Valery S. Gordon, Jean-Marie Proth, and Vitaly A. Strusevich Machine Scheduling with Availability Constraints, Chung-Yee Lee Scheduling with Discrete Resource Constraints, J. B_lazewicz, N. Brauner, and G. Finke Scheduling with Resource Constraints-Continuous Resources, Joanna J'ozefowska and Jan Weglarz Scheduling Parallel Tasks-Algorithms and Complexity, M. Drozdowski Scheduling Parallel Tasks Approximation Algorithms, Pierre-Franc' ois Dutot, Gr'egory Mouni'e, and Denis Trystram Real-Time Scheduling The Pinwheel: A Real-Time Scheduling Problem, Deji Chen and Aloysivs Mok Scheduling Real-Time Tasks: Algorithms and Complexity, Sanjay Baruah and Joael Goossens Real Time Synchronization Protocols, Lui Sha and Marco Caccamo Fair Scheduling of Real-Time Tasks on Multiprocessors, James Anderson, Philip Holman, and Anand Srinivasan A Categorization of Real-Time Multiprocessor Scheduling Problems and Algorithms, John Carpenter, Shelby Funk, Philip Holman, Anand Srinivasan, James Anderson, and Sanjoy Baruah Approximation Algorithms for Scheduling Time-Critical Jobs on Multiprocessor System, Sudarshan K. Dhall Scheduling Overloaded Real-Time Systems with Competitive/Worst Case Guarantees, Gilad Koren and Dennis Shasha Minimizing TotalWeighted Error for Imprecise Computation Tasks and Related Problems, Joseph Y-T. Leung Dual Criteria Optimization Problems for Imprecise Computation Tasks, Kevin I-J Ho Periodic Reward-Based Scheduling and Its Application to Power-Aware Real-Time Systems, Hakan Aydin, Rami Melhem, and Daniel Mosse Routing Real-Time Messages on Networks, G. Young Stochastic Scheduling and Queueing Networks Offline Deterministic Scheduling, Stochastic Scheduling, and Online Deterministic Scheduling: A Comparative Overview, Michael Pinedo Stochastic Scheduling with Earliness and Tardiness Penalties, Xiaoqiang Cai and Xian Zhou Developments in Queueing Networks with Tractable Solutions, Xiuli Chao Scheduling in Secondary Storage Systems, Alexander Thomasian Selfish Routing on the Internet, Artur Czumaj Applications Scheduling of Flexible Resources in Professional Service Firms, Yalcin Akcay, Anantaram Balakrishnan, and Susan H. Xu Novel Metaheuristic Approaches to Nurse Rostering Problems in Belgian Hospitals, Edmund Kieran Burke, Patrick De Causmaecker and Greet Vanden Berghe University Timetabling, Sanja Petrovic and Edmund Burke Adapting the GATES Architecture to Scheduling Faculty, R. P. Brazile and K. M. Swigger Constraint Programming for Scheduling, John J. Kanet, Sanjay L. Ahire, and Michael F. Gorman Batch Production Scheduling in the Process Industries, Karsten Gentner, Klaus Neumann, Christoph Schwindt, and Norbert Trautmann A Composite Very-Large-Scale Neighborhood Search Algorithm for the Vehicle Routing Problem, Richa Agarwal, Ravinder K. Ahuja, Gilbert Laporte, and Zuo-Jun "Max" Shen Scheduling Problems in the Airline Industry, Xiangtong Qi, Jian Yang and Gang Yu Bus and Train Driver Scheduling, Raymond S. K. Kwan Sports Scheduling, Kelly Easton, George Nemhauser, and Michael Trick Index

988 citations