scispace - formally typeset
Open AccessProceedings ArticleDOI

THEMIS: Fairness in Federated Stream Processing under Overload

TLDR
This work describes THEMIS, a federated stream processing system for resource-starved, multi-site deployments that executes queries in a globally fair fashion and provides users with constant feedback on the experienced processing quality for their queries.
Abstract
Federated stream processing systems, which utilise nodes from multiple independent domains, can be found increasingly in multi-provider cloud deployments, internet-of-things systems, collaborative sensing applications and large-scale grid systems. To pool resources from several sites and take advantage of local processing, submitted queries are split into query fragments, which are executed collaboratively by different sites. When supporting many concurrent users, however, queries may exhaust available processing resources, thus requiring constant load shedding. Given that individual sites have autonomy over how they allocate query fragments on their nodes, it is an open challenge how to ensure global fairness on processing quality experienced by queries in a federated scenario. We describe THEMIS, a federated stream processing system for resource-starved, multi-site deployments. It executes queries in a globally fair fashion and provides users with constant feedback on the experienced processing quality for their queries. THEMIS associates stream data with its source information content (SIC), a metric that quantifies the contribution of that data towards the query result, based on the amount of source data used to generate it. We provide the BALANCE-SIC distributed load shedding algorithm that balances the SIC values of result data. Our evaluation shows that the BALANCE-SIC algorithm yields balanced SIC values across queries, as measured by Jain's Fairness Index. Our approach also incurs a low execution time overhead.

read more

Content maybe subject to copyright    Report

City, University of London Institutional Repository
Citation: Kalyvianaki, E., Fiscato, M., Salonidis, T. and Pietzuch, P. (2016). THEMIS:
Fairness in Federated Stream Processing under Overload. Paper presented at the 2016
ACM International Conference on Management of Data (SIGMOD), 26 Jun - 01 Jul 2016,
San Francisco, USA.
This is the accepted version of the paper.
This version of the publication may differ from the final published
version.
Permanent repository link: https://openaccess.city.ac.uk/id/eprint/13546/
Link to published version:
Copyright: City Research Online aims to make research outputs of City,
University of London available to a wider audience. Copyright and Moral
Rights remain with the author(s) and/or copyright holders. URLs from
City Research Online may be freely distributed and linked to.
Reuse: Copies of full items can be used for personal research or study,
educational, or not-for-profit purposes without prior permission or
charge. Provided that the authors, title and full bibliographic details are
credited, a hyperlink and/or URL is given for the original metadata page
and the content is not changed in any way.
City Research Online

City Research Online: http://openaccess.city.ac.uk/ publications@city.ac.uk

THEMIS: Fairness in Federated Stream Processing
under Overload
Evangelia Kalyvianaki
City University London
sbbj913@city.ac.uk
Marco Fiscato
Imperial College London
mfiscato@doc.ic.ac.uk
Theodoros Salonidis
IBM TJ Watson Research Center
tsaloni@us.ibm.com
Peter Pietzuch
Imperial College London
prp@imperial.ac.uk
ABSTRACT
Federated stream processing systems, which utilise nodes from mul-
tiple independent domains, can be found increasingly in multi-pro-
vider cloud deployments, internet-of-things systems, collaborative
sensing applications and large-scale grid systems. To pool resources
from several sites and take advantage of local processing, submitted
queries are split into query fragments, which are executed collabo-
ratively by different sites. When supporting many concurrent users,
however, queries may exhaust available processing resources, thus
requiring constant load shedding. Given that individual sites have
autonomy over how they allocate query fragments on their nodes,
it is an open challenge how to ensure global fairness on processing
quality experienced by queries in a federated scenario.
We describe THEMIS, a federated stream processing system for
resource-starved, multi-site deployments. It executes queries in
a globally fair fashion and provides users with constant feedback
on the experienced processing quality for their queries. THEMIS
associates stream data with its source information content (SIC),
a metric that quantifies the contribution of that data towards the
query result, based on the amount of source data used to gener-
ate it. We provide the BALANCE-SIC distributed load shedding
algorithm that balances the SIC values of result data. Our evalua-
tion shows that the BALANCE-SIC algorithm yields balanced SIC
values across queries, as measured by Jain’s Fairness Index. Our
approach also incurs a low execution time overhead.
1. INTRODUCTION
Federated stream processing systems (FSPSs) [14, 13] contin-
uously process data streams using computation and network re-
sources from several autonomous sites [9]. Submitted queries are
split into query fragments, which can be deployed across multiple
sites. For example, a cloud-based stream processing system may
span more than one cloud provider to benefit from lower costs,
higher resilience or closer proximity to data sources. In collabo-
rative e-science applications, FSPSs such as OGSA-DAI [3] and
Astro-WISE [1] pool resources from multiple organisations to pro-
vide a shared processing service for high stream rates and computa-
tionally expensive queries. Participatory sensing and smart city in-
.
frastructures [5, 31] require deployments of systems that combine
independent domains with distinct data or processing capabilities
for a large user base.
A challenge is that FSPSs are likely to suffer from long-term
overload conditions. As a shared processing platform with many
users, they can experience a “tragedy of the commons” [19] when
users submit more queries than what can be sustained given the
available resources. Instead of adopting a rigid admission policy,
which rejects user queries when available resources are low, it is
more desirable for an FSPS to use load-shedding techniques [33,
27]. Under load-shedding, the FSPS provides a best-effort service
by reducing the resource requirements of queries through dropping
a fraction of tuples from the input data streams.
Appropriate load shedding in an FSPS, however, is complicated
by the fact that individual sites are autonomous and may imple-
ment their own resource allocation policies. For example, a site
may prioritise queries belonging to local users at the expense of
external query fragments. Without coordination of load-shedding
decisions across sites, multi-site queries may experience significant
variations in processing quality, depending on the load distribution
across sites. It is therefore an open challenge how to ensure that
queries spanning multiple autonomous sites in an FSPS experience
globally fair processing quality under overload conditions.
Many stream processing systems support load shedding mecha-
nisms to handle overload conditions. Load shedding mechanisms
that operate at the granularity of individual nodes [33, 10, 35], how-
ever, cannot achieve fair shedding decisions for queries spanning
multiple nodes. Proposals for distributed load shedding [34, 44]
associate a utility function with query output rates and aim to max-
imise the sum of utilities, which is not a representative measure
of fairness. In addition, they assume special structure and a-priori
knowledge of utility functions. Load shedding decisions are con-
trolled by a centralised entity or are based on pre-computed shed-
ding plans—both of which are not practical in an FSPS in which
domains retain control. Operator-specific semantic shedding ap-
proaches for, e.g. joins [21, 26, 17], aggregates [10, 35] or XML
streams [38] cannot be applied in a federated context when users
employ diverse sets of operators or customised, user-defined ones.
We describe a new approach for distributed load shedding in an
FSPS that treats queries in a globally fair fashion. The key idea
is to define a query-independent metric to measure the quality of
processing that query fragments have experienced, and then to use
this information for load shedding:
(1) We associate stream data with a metric called source informa-
tion content (SIC), which represents the contribution of that data to
the result in a query-independent way. The SIC metric quantifies
the amount of source data that was used to generate a given query

result data item. Intuitively, data that was aggregated over many
stream sources is considered to be more important to the final query
result. The SIC metric thus decouples processing quality from
the semantics of the operators and provides a query-independent
way to capture the quality of query processing with respect to tu-
ple shedding. This is particularly suited to accommodate a diverse
set of user queries that executes operators of various semantics and
even with user-defined operators.
(2) Overloaded nodes in the FSPS invoke a distributed semantic
fair load-shedding algorithm that aims to balance the SIC values of
query results across all queries, referred to as the BALANCE-SIC
fairness policy. This policy balances the SIC values of query re-
sults (i.e. maximises the Jain’s Fairness Index, a normalised scalar
metric that quantifies balance). It effectively utilises the process-
ing capacity of FSPS nodes, given the practical constraints of the
placement of queries on sites and their autonomy.
When queries are assigned across FSPS sites, it becomes chal-
lenging to control per-node tuple shedding and yet provide global
BALANCE-SIC fair processing. This stems from the fact that shed-
ding tuples at a node affects its resource availability and also the
processing quality of other queries. Since queries span across sites
and share resources, such effects are spread across sites, affecting
shedding decisions on the rest of the nodes. It is therefore non-
trivial to control tuple shedding globally in a federated setting.
In our approach, each node takes independent yet informed shed-
ding decisions about the overall processing quality of locally-hosted
queries. Queries provide continuous feedback on their processing
quality through the SIC metric. The shedding of tuples eventually
converges to global fairness as each node continuously adjusts its
shedding behaviour in response to that of other FSPS nodes.
To demonstrate the practicality of our fair load-shedding approach,
we describe THEMIS, an FSPS for overloaded deployments.
1
Our
evaluation of THEMIS shows that: (a) the SIC metric captures the
result degradation across a variety of query types; (b) in contrast
to the baseline of random shedding, THEMIS achieves 33% fairer
query processing, according to Jain Fairness Index, even with skewed
workload distributions; and (c) our approach has low overhead and
scales well to the number of nodes and queries.
In summary, the contributions and the paper outline are:
1. a query-independent model and a metric called SIC for quan-
tifying the quality of stream processing based on the amount
of information contributed by data sources (§4) and a practi-
cal approximation for computing it (§6);
2. the definition of the BALANCE-SIC fairness in an overloaded
FSPS based on the processing quality of queries; and a dis-
tributed algorithm for globally BALANCE-SIC-fair semantic
load-shedding in an FSPS, which takes the loss of informa-
tion suffered by queries into account (§5);
3. the design and implementation of THEMIS, an FSPS that im-
plements efficiently the BALANCE-SIC fair load-shedding
policy (§6); and
4. results from an experimental evaluation that demonstrate that
the approach achieves fair query processing under various
workloads in a federated setting (§7).
1
According to the Greek mythology, THEMIS is the Titan goddess
of law and order.
Paris
cloud-based
data center
Rome
governmental institute
data center
Mexico
research institute
data center
sensors
query
fragment
node
Figure 1: Example of a multi-site FSPS deployment for urban
micro-climate monitoring
2. OVERLOAD IN FEDERATED STREAM
PROCESSING
In this section, we describe the problem of fairness in query pro-
cessing, which arises in an overloaded FSPS. We identify the key
characteristics of an FSPS using an example application for query
data processing over micro-climate sensor-generated data (§2.1).
We then introduce the BALANCE-SIC fairness goal for an over-
loaded FSPS (§2.2) and discuss related work (§2.3).
2.1 Federated Stream Processing
Consider a use case of a globally-distributed FSPS for urban
micro-climate monitoring. Figure 1 shows a deployment of such
a system across three sites (i.e. Rome, Paris and Mexico), with en-
vironmental sensors as data sources. The FSPS collects data from
a range of sensors, such as air temperature, humidity and carbon
monoxide, and processes the data in real-time for analysis. Queries
are issued, e.g. by government agencies for urban planning, trans-
port authorities, citizens with respiratory problems and meteorolog-
ical researchers. A sample high-level data streaming queries may
continuously report: the 10 highest values of carbon monoxide
concentration measurements on highways in Mexico every minute
and the covariance matrix between measurements of (temperature,
airflow) and (carbon dioxide, nitrogen) in Paris every 10 minutes”.
Each site consists of a data centre with physical nodes running a
local distributed stream processing system, and we assume seam-
less integration across all these systems at the federated sites [12].
Below, we provide a high-level overview of data stream process-
ing in an FSPS. Sources generate tuples for processing by queries.
Queries are subdivided into query fragments and deployed at one
or more sites. A query fragment consists of one or more operators,
and each fragment of the same query is deployed on a different
FSPS node. Query fragments use resources, i.e. CPU, memory,
disk space and network bandwidth, to process incoming tuples and
generate output tuples. Output tuples may be further processed by
fragments of the same query, until result tuples are sent to the user
issuing the query. Nodes share their resources among fragments
belonging to different queries.
Below, we identify three main characteristics regarding user be-
haviour and resource utilisation in such an FSPS:
C1. Skewed query workload distribution. Sites primarily host
queries of local users so the overall load distribution across sites
may be skewed, with some sites being more loaded than others. In
general, query fragments cannot be allocated uniformly across sites
due to local policy constraints or the reliance on local sources. For
example, queries using forecasting algorithms may be restricted to

running at a given site due to licensing constraints, which may limit
the number of authorised users or remote sites using the system.
C2. Permanent resource overload. Due to the shared nature of
an FSPS, we assume that the system is constantly overloaded, i.e.
its resources are lower than required for perfect execution of all
queries. In the above example, queries are issued by a large user
population, leading to high demand. A common strategy for an
FSPS to handle overload is to use tuple shedding [27, 33].
C3. Site autonomy. The collaborative nature of an FSPS means
that a site should accept incoming queries, even under high load.
However, sites belonging to different administrative domains are
managed autonomously. It is therefore infeasible to assume cen-
tralised control over all tuple shedding decisions, enforced across
all sites. Instead, sites elect to cooperate, having only a partial view
of all resource allocation decisions across the whole FSPS.
2.2 Fairness in FSPS
The problem of how to implement fair query processing arises
naturally in an overloaded FSPS. There exist many different ap-
proaches to address overload conditions. For example, admission
control rejects incoming queries under overload [41, 40]. Such
methods are not applicable in a federated context because the col-
laborative nature means that submitted queries must be accepted.
Other approaches redistribute operators for load balancing [43, 40,
42, 11]. However, query placement in an FSPS is typically con-
trolled by users, e.g. to leverage characteristics such as proximity.
We employ distributed load shedding to address overload con-
ditions in an FSPS. By using load shedding, we assume that users
agree to use the FSPS and receive degraded query processing for
their queries. We assume that users submit queries whose results
remain useful, even when their processing is degraded due to load
shedding, such as aggregates [10], including averages, counts as
well as top-k queries. Finally, we use distributed load shedding to
comply with site autonomy (see C3 in §2.1).
There are two challenges to implement distributed load shed-
ding. First, there is a need for a query-independent processing
metric to capture the impact of shedding on the quality of query
processing. Ideally, we require a measure for processing quality
that quantifies the processing degradation under shedding but is
query-independent, i.e. it does not have to be adapted manually to
the semantics of specific queries. With such a measure, it becomes
possible to compare the impact of tuple shedding across queries
and hence guide shedding decisions according to a fairness policy.
In §4, we introduce the SIC query-independent metric that captures
the quality of processing by measuring the contribution of source
tuples actually used for generating query results.
Second, depending on the deployment of query fragments to
sites, some queries may get more penalised due to overload than
others. We therefore want to achieve global fairness across all
queries by enforcing load shedding at all sites so that all queries
are equally penalised by the shedding. We achieve this by aiming
to equalise a fairness measure of all queries after shedding. It is a
challenge how to implement fairness across queries executing on
overloaded, distributed and autonomous sites, regardless of their
deployment. In §5, we present a new distributed load shedding al-
gorithm that maintains BALANCE-SIC fairness of queries.
2.3 Related Work
The research community recognises the need for FSPSs, explor-
ing relevant research challenges. Tatbul [32] argues for the inte-
gration of multiple stream processing engines for a variety of ap-
plications and pinpoints the challenges when dealing with hetero-
geneous query semantics. Botan et al. [12] present MaxStream, an
FSPS for business intelligence applications. Our focus instead is
on fairness in an overloaded FSPS using load shedding.
Centralised load shedding. Early proposals for load shedding fo-
cus on single-node systems [4, 33, 28]. A simple way to address
overload is through random shedding [33] that discards arbitrary
tuples. This baseline approach is easy to implement and has low
overhead, however, it cannot be used to control the shed tuples.
In contrast, semantic shedding discards tuples using a function
that correlates them with their contribution to the quality of the
query output [33]. Tuples are discarded in a way that maximises
result quality. Carney et al. [15] describe generic drop- and value-
based functions to quantify the contribution of tuples on the result.
A drop-based function specifies how the result quality of a query
decreases with the number of discarded tuples. Many systems dis-
card tuples as to maximise the output tuple rate [17, 34]. In some
cases, a value-based function correlates the query output quality
with the values of the output tuples [23]. In contrast, our goal is to
maximise the contribution of the source tuples used for processing.
There exist semantic load shedding approaches for specific oper-
ator types, such as joins [21, 26, 17], aggregates [10, 35] and XML
operators [38]. These approaches require domain knowledge of the
operator semantics, while we treat operators as black-boxes.
Distributed load shedding. The problem of distributed load al-
location has been studied for stream processing systems. Zhao et
al. [44] consider the allocation problem for applications with tasks
modelled as synchronous and asynchronous forks and joins, com-
menting that this approach can be applied to distributed stream pro-
cessing. Their work emphasises a theoretical framework for con-
vergence to an optimal solution and presents simulations over two
queries and three output streams. In contrast, we provide a fair
stream processing system based on the contributions of source tu-
ples and evaluate a prototype implementation.
Tatbul et al. [34] employ distributed shedding to maximise the
total weighted throughput of queries by computing the drop se-
lectivity of random or window drop operators inserted at the in-
put streams of a stream processing system. Shedding decisions
are made sequentially by each node along a query, starting from
the leaves and propagating through metadata up to the input nodes.
They assume identical queries layouts to nodes (e.g. all root com-
ponents are deployed on the same node), which is not applicable in
a federated system. Finally, the scalability of the approach remains
unclear, as the simulation-based evaluation only includes a hand-
ful of applications. In our prototype evaluation, we execute several
hundreds of queries across tens of nodes.
Both approaches [34, 44] perform load shedding to maximise
the sum of utility functions but sum maximisation does not achieve
fairness. In addition, they require utility functions of special struc-
ture (either linear weighted functions [34] or concave functions [44]
of rate), which does not capture query utility in practice. Finally,
they require a-priori knowledge of the utility functions, which is
challenging to estimate offline. In contrast, our approach targets
fairness without assuming specific, a-priori utility functions. The
only assumption is that the “utility” (as captured by the SIC metric)
decreases with shedding and is implicitly modelled during system
operation through the propagation and updating of the SIC metric
in the data tuples.
An important issue for load shedding is the selection of drop lo-
cations in a query plan [33, 10]. The most efficient way is to discard
tuples at upstream operators, close to sources [15, 20, 29]. This,
however, is difficult to do in an FSPS because it requires global
information about query plans that span multiple sites.

Citations
More filters
Proceedings ArticleDOI

GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks

TL;DR: This study presents GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices, and significantly increases throughput by up to 3x compared to the baseline, without violating SLAs for a wide range of real-world AI and ML applications.
Proceedings ArticleDOI

Overload Control for Scaling WeChat Microservices

TL;DR: In this paper, an overload control scheme designed for the account-oriented microservice architecture is proposed, which is service agnostic and system-centric and manages overload at the microservice granule such that each microservice monitors its load status in real time and triggers load shedding in a collaborative manner among its relevant services when overload is detected.
Journal ArticleDOI

A holistic view of stream partitioning costs

TL;DR: The need to incorporate aggregation costs in the partitioning model when executing stateful operations in parallel, in order to minimize the overall latency and/or through-put is identified and demonstrated.
Proceedings ArticleDOI

Distributed resource management across process boundaries

TL;DR: Wisp is designed, a framework for building SOAs that transparently adapts rate limiters and request schedulers system-wide according to operator policies to satisfy end-to-end goals while responding to changing system conditions.
Proceedings ArticleDOI

Load-aware shedding in stream processing systems

TL;DR: This paper provides a theoretical analysis proving that LAS is an (ε, δ)-approximation of the optimal online load shedder and shows its performance through a practical evaluation based both on simulations and on a running prototype.
References
More filters
Journal ArticleDOI

The Tragedy of the Commons

TL;DR: The population problem has no technical solution; it requires a fundamental extension in morality.
Posted Content

A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems

TL;DR: A quantitative measure called Indiex of FRairness, applicable to any resource sharing or allocation problem, which is independent of the amount of the resource, and boundedness aids intuitive understanding of the fairness index.

Web Services Architecture

TL;DR: This document defines the Web Services Architecture, which identifies the functional components, defines the relationships among those components, and establishes a set of constraints upon each to effect the desired properties of the overall architecture.
Journal ArticleDOI

Aurora: a new model and architecture for data stream management

TL;DR: The basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications, are described and a stream-oriented set of operators are described.
Journal ArticleDOI

The CQL continuous query language: semantic foundations and query execution

TL;DR: This paper presents the structure of CQL's query execution plans as well as details of the most important components: operators, interoperator queues, synopses, and sharing of components among multiple operators and queries.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What are the contributions in this paper?

The authors describe THEMIS, a federated stream processing system for resource-starved, multi-site deployments. The authors provide the BALANCE-SIC distributed load shedding algorithm that balances the SIC values of result data. Their approach also incurs a low execution time overhead. 

The assignment of SIC values to derived tuples is performed as per Equation (3), which requires the sets of input and output tuples. 

To reduce the impact of delays when disseminating the result SIC values by the query coordinator to nodes hosting query fragments, the load shedder estimates the result SIC values of queries based on its local shedding. 

Certain operators in the query graph are connected to a finite set of sources, which are denoted by S and produce source tuples in time-variant rates. 

The algorithm follows a gradient ascent approach to increase gradually the result SIC values of all queries while minimising the pairwise SIC differences of the two queries with the lowest SIC values. 

(2) Overloaded nodes in the FSPS invoke a distributed semantic fair load-shedding algorithm that aims to balance the SIC values of query results across all queries, referred to as the BALANCE-SIC fairness policy. 

it is challenging in practice to capture accurately the sets T̃ S , T S , T̃ R and T R because source tuples are successively transformed to derived tuples by operators and some are shed: de-rived tuples are “lost”, e.g. due to filters and joins, which only select a subset of their input tuples. 

in their model, the SIC metric captures the importance of tuples, i.e. the higher the SIC value, the more important is the tuple, the algorithm thus always keeps the most valuable tuples (max(xSIC ) in line 16). 

Since queries span across sites and share resources, such effects are spread across sites, affecting shedding decisions on the rest of the nodes. 

the SIC value of an individual source tuple ts is inversely proportional to |T Ss | and is also normalised by the number of sources |S| in a query for a query-independent metric. 

the authors require a measure for processing quality that quantifies the processing degradation under shedding but is query-independent, i.e. it does not have to be adapted manually to the semantics of specific queries. 

Figure 11 shows that, when more queries are multi-fragmented, the BALANCE-SIC fairness algorithm converges to a fairer system, as more queries span nodes. 

the query SIC value of result tuples is:qSIC := ∑tr∈T̃ RtrSIC, (4)where the authors only consider result tuples tr ∈ T̃ R ⊆ T R that are derived from source tuples ∈ 

There exist semantic load shedding approaches for specific operator types, such as joins [21, 26, 17], aggregates [10, 35] and XML operators [38]. 

The shedding of tuples eventually converges to global fairness as each node continuously adjusts its shedding behaviour in response to that of other FSPS nodes. 

The solution of [44] is obtained using Matlab and the Jain’s fairness index for the resulting utilities’ distribution (normalised log-output rates) equals 0.87.