Load-aware shedding in stream processing systems
Summary (3 min read)
1. INTRODUCTION
- Distributed stream processing systems (DSPS) are today considered as a mainstream technology to build architectures for the real-time analysis of big data.
- This latter aspect is often critical, as input data streams may unpredictably change over time both in rate and content.
- Existing load shedding solutions either randomly drop tuples when bottlenecks are detected or apply a pre-defined model of the application and its input that allows them to deterministically take the best shedding decision.
- The tuple execution duration, in fact, may depend on the tuple content itself.
- Afterwards, Section 3 details LAS whose behavior is then theoretically analyzed in Section 4.
2. SYSTEM MODEL AND PROBLEM DEFINITION
- The authors consider a distributed stream processing system (DSPS) deployed on a cluster where several computing nodes exchange data through messages sent over a network.
- Data injected by source operators is encapsulated in units called tuples and each data stream is an unbounded sequence of tuples.
- Without loss of generality, here the authors assume that each tuple t is a finite set of key/value pairs that can be customized to represent complex data structures.
- On the other hand, the input throughput of the stream may vary, even with a large magnitude, at any time.
- The goal of the load shedder is to maintain the average queuing latency smaller than a given threshold τ by dropping as less tuples as possible while the stream unfolds.
3.1 Overview
- Load-Aware Shedding (LAS) is based on a simple, yet effective, idea: if the authors assume to know the execution duration w(t) of each tuple t in the operator, then they can foresee queuing times and drop all tuples that will cause the queuing latency threshold τ to be violated.
- The value of w(t) is generally unknown.
- Then, it computes the sum of the estimated execution durations of the tuples assigned to the operator, i.e., Ĉ = ∑ i∈[m]\D ŵ(t).
- To enable this approach, LAS builds a sketch on the operator (i.e., a memory efficient data structure) that will track the execution duration of the tuples it processes.
- This solution does not require any a priori knowledge on the stream or system, and is designed to continuously adapt to changes in the input stream or on the operator characteristics.
3.2 LAS design
- The operator maintains two Count Min [4] sketch matrices : the first one, denoted as F , tracks the tuple frequencies ft; the second one, denoted as W, tracks the tuples cumulated execution durations.
- In the positive case the operator sends the F andW matrices to the load shedder , resets their content and moves back to the START state .
- While being in the SEND states, LS sends to O the current cumulated execution duration estimation Ĉ piggy backing it with the first tuple t that is not dropped (Listing 3.2 lines 24-26) and moves in the RUN state .
- It then checks if the estimated queuing latency for t satisfies the Check method (Listing 3.2 lines 19-21).
4. THEORETICAL ANALYSIS
- Data streaming algorithms strongly rely on pseudo-random functions that map elements of the stream to uniformly distributed image values to keep the essential information of the input stream, regardless of the stream elements frequency distribution.
- First the authors study the correctness and optimality of the shedding algorithm, under full knowledge assumption (i.e., the shedding strategy is aware of the exact execution duration wt for each tuple t).
- The proofs of these equations as well as some numerical applications to illustrate the accuracy are discussed in [8].
- Then, according to Theorem 4.1, LAS is an (ε, δ)-optimal algorithm for load shedding, as defined in Problem 2.1, over all possible data streams σ.
5. EXPERIMENTAL EVALUATION
- In this section the authors evaluate the performance obtained by using LAS to perform load shedding.
- The authors will first describe the general setting used to run the tests and will then discuss the results obtained through simulations and with a prototype of LAS integrated within Apache Storm.
- Due to space constraints, the exhaustive presentation of these experiments are available in the companion paper [8].
5.1 Setup
- In their tests the authors consider both synthetic and real datasets.
- Conversely, an input throughput larger than 1/W will result in an under-provisioned system.
- In order to generate 100 different streams, the authors randomize the association between the wn execution duration values and the n distinct items: for each of the wn execution duration values they pick uniformly at random n/wn different values in [n] that will be associated to that execution duration value.
- This means that each single experiment reports the mean outcome of 5, 000 independent runs.
- Among other information, the tweets are enriched with a field mention containing the entities mentioned in the tweet.
5.2 Simulation Results
- In this section the authors analyze through simulations the sensitivity of LAS while varying several characteristics of the input load.
- As expected, in this latter case all algorithms perform at the same level as load shedding is superfluous.
- At the beginning of this phase both Straw-Man and LAS perform bad, with queuing latencies that are largely above τ .
- While the phase unfolds LAS quickly updates its data structures and converges toward the given threshold, while Straw-Man diverges as tuples continue to be enqueued on the operator worsening the bottleneck effect.
5.3 Prototype
- The source reads from the dataset and emits the tuples consumed by bolt LS.
- The authors assumed that this leads to a long execution duration for media (e.g., possibly caused by an access to an external DB to gather historical data), an average execution duration for politicians and a fast execution duration for others (e.g., possibly because these tweets are not decorated).
- Figure 7 reports the average completion latency as the stream unfolds.
- Conversely, StrawMan completion latencies are at least one order of magnitude larger.
- These results confirm the effectiveness of LAS in keeping a close control on queuing latencies (and thus provide more predictable performance) at the cost of dropping a fraction of the input load.
7. CONCLUSIONS
- A novel solution for load shedding in DSPS.the authors.
- LAS is based on the observation that load on operators depends both on the input rate and on the content of tuplesLAS leverages sketch data structures to efficiently collect at runtime information on the operator load characteristics and then use this information to implement a load shedding policy aimed at maintaining the average queuing latencies close to a given threshold.
- Furthermore, tests conducted both on a simulated environment and on a prototype implementation confirm that by taking into account the specific load imposed by each tuple, LAS can provide performance that closely approach a given target, while dropping a limited number of tuples.
Did you find this useful? Give us your feedback
Citations
335 citations
65 citations
20 citations
Cites background from "Load-aware shedding in stream proce..."
...Random shedding strategies have been implemented in current streaming systems, such as Heron [19], and Kafka [7], while shedding may also be guided by queueing latencies [36], concept drift detection [26], and the expected quality of service [33]....
[...]
14 citations
Cites background from "Load-aware shedding in stream proce..."
...A common alternative is to apply load shedding [33] to prevent the streaming buffers to indefinitely grow by discarding input items....
[...]
11 citations
Cites background from "Load-aware shedding in stream proce..."
...Load shedding has been proposed by several research groups [13, 14, 21, 23, 29, 30] in the stream processing domain....
[...]
References
2,886 citations
"Load-aware shedding in stream proce..." refers methods in this paper
...[3] J. L. Carter and M. N. Wegman....
[...]
...Carter and Wegman [3] provide an efficient method to build large families of hash functions approximating the 2-universality property....
[...]
1,939 citations
1,598 citations
1,518 citations
662 citations
"Load-aware shedding in stream proce..." refers background in this paper
...first introduced in [11] the idea of semantic load shedding....
[...]
Related Papers (5)
Frequently Asked Questions (9)
Q2. How do the authors fine tune the parameter r to log(1/)?
By finely tuning the parameter r to log(1/δ), under the assumption of [8], the authors are then able to (ε, δ)-approximate w(t) for any t ∈ [n].
Q3. What is the first stream processing system where shedding has been proposed?
Aurora [1] is the first stream processing system where shedding has been proposed as a technique to deal with bursty input traffic.
Q4. What is the definition of bursty input load?
Bursty input load represents a problem for DSPS as it may create unpredictable bottlenecks within the system that lead to an increase in queuing latencies, pushing the system in a state where it cannot deliver the expected quality of service (typically expressed in terms of tuple completion latency).
Q5. What is the LAS cost model for shedding tuples?
At the arrival of the i-th tuple, subtracting from Ĉ the (physical) time elapsed from the emission of the first tuple provides LAS with an estimation q̂(i) of the queuing latency q(i) for the current tuple.
Q6. What is the transition to phase F?
The transition to phase F is extremely abrupt as the input throughput is brought back to the equivalent of 0% of under-provisioning, but the cost to handle each tuple on the operator is doubled.
Q7. What is the transition to phase C?
The transition to phase C brings the system back in the initial configuration, while in phase D the change in the tuple frequency distribution is managed very differently by each solution: both Full Knowledge and LAS compensate this change by starting to drop more tuples, but still maintaining the average queuing latency close to the desired threshold τ .
Q8. What is the result of the sketch data structures used to trace tuples?
This result stems from the fact that the sketch data structures used to trace tuple execution durations perform at their best on strongly skewed distribution, rather than on uniform ones.
Q9. What is the simplest way to restrict a tuple to a topology?
The authors also restrict their model to a topology with an operator LS (load shedder) that decides which tuples of its outbound DS σ consumed by operator O shall be dropped.