What have the authors stated for future works in "Flowprophet: generic and accurate traffic prediction for data-parallel cluster computing" ?

The authors make sure that the application programming interfaces ( APIs ) of FLOWPROPHET is general, so that existing and future computing frameworks can readily deploy FLOWPROPHET to generate accurate flow predictions. The authors also show that simple network optimizations with aheadof-time flow predictions can provide substantial improvement in application performance.

What is the way to measure the time in a distributed setting?

For accurate time measurement in a distributed setting, the authors deploy NTP [28] on the master node and worker nodes to synchronize system clock.

How long does FLOWPROPHET take to predict traffic?

In addition, as Hadoop spends much more timeto read and write data from disk while Spark visits data in memory directly, FLOWPROPHET manages to achieve larger lead time on Hadoop.

What is the purpose of extending the argument lists of FLOWPROPHET?

To enable FLOWPROPHET in a multi-tenant cluster, the authors plan to extend the argument lists of FLOWPROPHET APIs with user IDs (stages, jobs, tasks, and flows will be tagged with a user ID).

What is the description of FLOWPROPHET?

FLOWPROPHET offers simple, flexible, and fine grained interfaces to predict flow information, and they are able to adapt to a wide range of scenarios.

How many workers are used to complete a job?

In Figure 14 and Figure 15, as the number of workers increases, more parallel computing resources are utilized and therefore the job completion time decreases gradually.

How does FLOWPROPHET achieve the accuracy in source, destination and flow size predictions?

The authors conclude that FLOWPROPHET achieves high (almost 100%) accuracy in source, destination and flow size predictions for both Spark and Hadoop.

What is the definition of establish time?

After the current stage is completed, DCFs usually do a relatively fixed number of operations to start the next stage, and the authors refer this period of time as flow establish_time.

(Open Access) FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing (2015) | Hao Wang

Q: What are the contributions in "Flowprophet: generic and accurate traffic prediction for data-parallel cluster computing" ?

In this paper, the authors design and implement FLOWPROPHET, a general framework to predict traffic flows for DCFs. To this end, the authors analyze and summarize the common features of popular DCFs, and gain a key insight: since application logic in DCFs is naturally expressed by directed acyclic graphs ( DAG ), DAG contains necessary time and data dependencies for accurate flow prediction. The authors also provide generic programming interface to FLOWPROPHET, so that current and future DCFs can deploy FLOWPROPHET readily. The authors implement FLOWPROPHET on both Spark and Hadoop, and perform extensive evaluations on a testbed with 37 physical servers.

Q: How does DAG help to achieve scalable performance?

To achieve scalable performance, DCFs automatically discover and exploit parallelism from user’s application logic, and distribute parallel computational tasks to every computing node.

Q: How does the Flow Calculator predict the flow of a task?

Since the predicted flows will not take place until all the tasks on the master are delivered to the designated workers, the Flow Calculator will most likely export flow information in advance.

FLOWPROPHET: Generic and Accurate Trafﬁc

Prediction for Data-parallel Cluster Computing

∗

Hao Wang

SJTU and HKUST

Li Chen

HKUST

Kai Chen

HKUST

∗

Ziyang Li

NUDT and HKUST

Yiming Zhang

NUDT

Haibing Guan

SJTU

Zhengwei Qi

SJTU

Dongsheng Li

NUDT

Yanhui Geng

Huawei

Abstract—Data-parallel computing frameworks (DCF) such as

MapReduce, Spark, and Dryad etc. have tremendous applications

in big data and cloud computing, and throw tons of ﬂows into

data center networks. In this paper, we design and implement

FLOWPROPHET, a general framework to predict trafﬁc ﬂows for

DCFs. To this end, we analyze and summarize the common fea-

tures of popular DCFs, and gain a key insight: since application

logic in DCFs is naturally expressed by directed acyclic graphs

(DAG), DAG contains necessary time and data dependencies for

accurate ﬂow prediction. Based on the insight, FLOWPROPHET

extracts DAGs from user applications, and uses the time and data

dependencies to calculate ﬂow information 4-tuple, (source,

destination, flow_size, establish_time), ahead-of-

time for all ﬂows. We also provide generic programming interface

to FLOWPROPHET, so that current and future DCFs can deploy

FLOWPROPHET readily. We implement FLOWPROPHET on both

Spark and Hadoop, and perform extensive evaluations on a

testbed with 37 physical servers. Our implementation and exper-

iments demonstrate that, with time in advance and minimal cost,

FLOWPROPHET can achieve almost 100% accuracy in source,

destination, and ﬂow size predictions. With accurate prediction

from FLOWPROPHET, the job completion time of a Hadoop

TeraSort benchmark is reduced by 12.52% on our cluster with

a simple network scheduler.

I. INTRODUCTION

Data-parallel computing frameworks (DCFs) such as

MapReduce [1], Dryad [2], Spark [3], etc. have tremendous

applications, especially in big data and cloud computing.

DCFs greatly enhance programmers’ productivity by abstract-

ing away implementation details, so that the programmers can

focus on the application logic without worrying about resource

contention, task distribution, and so on. They only need to

apply the APIs (e.g., filter(), map(), reduce()) to

express their logic and manipulate their dataset as if on a single

machine.

DCFs effectively decouple the detailed distributed comput-

ing implementation from the user programs. However, lower

level implementation details hold the key to better application

performance, and lots of research efforts have been spent

along this direction recently. On the micro level, ﬂow-based

optimization mechanisms (e.g., [4]–[8]) attempt to minimize

average completion time of ﬂows or groups of ﬂows by

exploiting ﬂow size provided by the applications. On the macro

level, architectural bandwidth provisioning (e.g., [9]–[12]) and

trafﬁc engineering (e.g., [13]–[15]) solutions try to estimate

∗

This work was performed when Hao Wang and Ziyang Li were intern

students at SING Group @ HKUST.

aggregate application trafﬁc demands to enable dynamic net-

work resource allocation. Note that both approaches depend

on predicting the future: the trafﬁc and ﬂow information has

to be known ahead-of-time.

Predicting the future is inherently difﬁcult, and most exist-

ing solutions settle on using heuristic algorithms or measuring

network level parameters, such as ﬂow counters [9, 13] and

socket buffer occupancy [10, 16]. However, these methods

are in essence reacting to trafﬁc, rather than predicting, and

therefore result in poor performance [17].

More recently, an application level trafﬁc forecasting so-

lution, HadoopWatch [18], derives trafﬁc through measuring

task assignments and data size indications on ﬁle systems

at the master and worker nodes in Hadoop. However, this

method is customized for Hadoop, and only works when the

underlying application logic is as simple as Hadoop, which can

be described in 2 stages: map and reduce. When the application

logic becomes more complex, this method is uncontrollable

and inaccurate (or incorrect) because it does not know where,

what and when to collect useful information. For example,

in Spark [3], there are multiple stages, and stages that are

consecutive in time may or may not have data dependencies

when doing lazy evaluation [3]. In fact, accurate trafﬁc pre-

diction requires the knowledge of time and data dependencies,

which are closely related to the applications logic and the

corresponding representations of DCFs.

In this paper, we seek a generic and accurate method to

predict ﬂow information for data-parallel cluster computing

frameworks. We speciﬁcally set our design goals as follows:

• Generic: We should devise a general interface for trafﬁc

prediction that works for all current and future DCFs. To

this end, we should have a general description of application

execution patterns in order to express complex application

logic.

• Accurate and ﬁned-grained: The method must be able to

provide accurate ﬂow level information, rather than coarse

aggregated trafﬁc demand, to enable ﬁne-grained network

control and optimization. The method should also provide

detailed inter-ﬂow dependency information to feed recent

coﬂow optimizations [7, 8].

• Ahead-of-time: The method must be able to predict the

ﬂows before they enter the network; and ideally it should

also estimate the ﬂow establish_time accurately.

• Scalable and low-overhead: The method should be able

to work at large scale and introduce as little overhead to the

DCFs as possible.

In essence, we aim to calculate the 4-tuple (source,

destination, flow_size, establish_time) for each

ﬂow. The intention of the ﬁrst three elements is straightfor-

ward. The establish_time is used to determine the exact

time when a ﬂow will establish (e.g., a network scheduler

will need to make a scheduling decision before the ﬂow

establishes). Thus, we need to know both the logical order of

data processing and the locations and sizes of data partitions.

To this end, we examine prevalent DCFs and identify the key

observation (details in Section II): since application logic is

naturally represented by directed acyclic graphs (DAG) in all

DCFs, DAG contains necessary time and data dependencies

for accurate ﬂow prediction. With DAG, we can explicitly

know where, what, and when information to measure in order

to accurately calculate ﬂow information for complex parallel

computing applications.

Based on the insights, we present FLOWPROPHET, a

general framework to predict ﬂow information for all DCFs.

FLOWPROPHET extracts DAG from data-parallel applications,

then uses the DAG to guide the measurement and prediction. In

the course of design and implementation of FLOWPROPHET,

we make the following contributions:

• We analyze and summarize the common execution patterns

of popular computing frameworks, and extract DAG to

obtain time and data dependencies from applications using

these frameworks to guide the ﬂow prediction.

• We design FLOWPROPHET, a lightweight, generic, and

accurate ﬂow information prediction framework for DCFs.

The application programming interface (API) of FLOW-

PROPHET is general, so that existing and future computing

frameworks can readily use FLOWPROPHET to generate

accurate ﬂow information.

• We have implemented FLOWPROPHET on the most pop-

ular frameworks such as Hadoop and Spark, and build a

real testbed with 37 servers to evaluate it. Our experiments

show that with time in advance and negligible overhead

to application performance, FLOWPROPHET can achieve

almost 100% accuracy in source, destination, and ﬂow size

predictions.

• Using accurate prediction from FLOWPROPHET, we show

that even a simple network level optimization can greatly

improve application performance. In our experiment, the job

completion time of a Hadoop TeraSort-25G benchmark is

reduced by 12.52% on our 37-server cluster.

The rest of this paper is organized as follows. Section II

introduces the key observation that motivates us to leverage

DAG to predict ﬂow information. Section III presents the

design and implementation of FLOWPROPHET. Section IV

discusses the evaluation benchmarks and results of FLOW-

PROPHET. Section V reviews the related works. Section VI

concludes the paper.

II. DAG-ASSISTED FLOW PREDICTION

In this section, we examine how DAG assists the calcula-

tion of ﬂow information (summarized in Figure 1). We ﬁrst

delve into the typical application life-cycle in popular DCFs,

and then establish the relationships between application logic,

execution sequence, DAG, and data movement. Finally, we

demonstrate the practical calculation steps of ﬂow information

prediction using DAG.

Time Dependency

…

Data Dependency

Application

Submit

job#1 job#n

Master

Worker#1

Worker#2

Worker#n

…

data transfer

dependency stage

data partition

tasks

…

Task

Assignment

Stage#5

Stage#4

job#2

Fig. 1: Data-parallel computing framework: application

logic and data movement.

Application Life-cycle: In DCFs, there is a gap between

the application logic and the actual operations in the backend

cluster, which may contain thousands of CPU cores, because

user application only concerns with a single machine during

development to lower complexity. To achieve scalable perfor-

mance, DCFs automatically discover and exploit parallelism

from user’s application logic, and distribute parallel computa-

tional tasks to every computing node.

The life-cycle of a user application is described in Figure 1.

At the start, user application is resolved into jobs

. For each

job, DCFs calculate the order of executions and data depen-

dency, which can be described by a DAG, as shown in Figure 1.

Speciﬁcally, DCFs identify which tasks have dependency on

which data partition, and plan the parallel executions of the

application. These tasks are aggregated into a stage. Then,

the tasks in a stage are assigned to workers, and the parallel

operations on the dataset are launched. The nodes in DAG are

stages, and the arcs represent dependency between stages. Data

transfer occur only during stage transitions.

Almost all popular DCFs describe their operations in

DAGs. For example, Dryad’s [2] execution engine is driven by

a graph description language, which empowers the developer

with explicit graph construction. Pregel [20], which is based on

Bulk Synchronous Parallel (BSP), adopts a sequence of super-

steps to construct user application. Every superstep contains a

data communication phase and a barrier synchronization phase,

which is essentially a DAG with two vertices and one edge.

Spark [3] deﬁnes a novel structure named Resilient Distributed

Dataset (RDD) which expresses DAG with RDD lineage.

Spark provides transformations, and actions (e.g., union(),

join(), filter(), map(), take(), etc.) to build RDD

lineage and explicitly express algorithm logic. Compared with

previous framework, MapReduce [1] (or Hadoop [19]) is much

simpler. Its two primitive semantics: map and reduce can

also be regarded as a DAG contains only two vertices and one

edge. CIEL [21] develops a language named Skywriting [22]

and a series of operators (e.g., exec(), spawn(), map(),

etc.) to express task-level parallelism in DAG.

Iterative applications with termination criterions will be divided into

dependent jobs: each will check the termination criterion to decide whether

to move on to the next.

…

map tasks

reduce tasks

input data

output data

(a) Data shufﬂe between mappers

and reduces in Hadoop [19]

…………….

input data

output ﬁles

computing

vertices

…………….

(b) Data channels between computing

vertices in Dryad [2]

…

supserstep(i)

barrier synchronization

computing

nodes

computing

nodes

perstep of Bulk Synchronous Paral-

lel (BSP) in Pregel [20]

stage #3

…

input data

stage #1

stage #2

stage #0

output data

tasks

…

(d) Data shufﬂe between stages in

Spark [3]

Fig. 2: Data movement patterns.

Observation: DAG contains necessary time, data, and ﬂow

dependencies for accurate ﬂow prediction.

Time dependency: Time-dependency refers to the execution

order of stages. DCFs process the DAG one node (stage) at

a time in a depth-ﬁrst-traversal order [3], and generate this

order. Stages may execute parallel in time, while others have to

wait for completion of parent stages. Trafﬁc is only generated

between parent and child stages, and with DAG, we know

when the ﬂow transmission will occur.

Data dependency: DCFs maintain the life cycle of data:

import, transfer, storage and export. First, data imported into

the cluster will be split and distributed to the entire cluster.

Then, DCFs assign computation tasks to each node based on

data locality and resource scheduling scheme. Along with the

execution of computation tasks, intermediate data is generated

and cached locally. In Hadoop (Figure 2(a)), a JobTracker

informs reducers when and where (i.e. which mapper node) to

fetch data to perform reduce tasks. In Dryad, data channels

are maintained between computing vertices (Figure 2(b)),

and data ﬂows along these channels. For Pregel, a superstep

requires all the computing node to exchange data by barrier

synchronization before the next superstep (Figure 2(c)). For

Spark, data shufﬂe takes place between speciﬁc stages based

on the dependency recorded in RDD lineage (Figure 2(d)).

In summary, since every process of the data life cycle is

conducted by DCFs, DCFs are capable of exporting location

and size of every piece of intermediate data and ﬁnal results.

Since trafﬁc is essentially data movement, ﬂow prediction

requires knowing where, what and when the data is moved,

and such information can be retrieved from the DAG. When a

stage (a node in DAG) relies on the output of a group of stages

(every stage in this group is called the stage’s parent), it has

to wait until all the parents are ﬁnished. Concurrently running

stages do not have data dependency on each other. Thus, we

can infer from the DAG the source (parent stages), destination

(child stage), size (amount of data required), and time (upon

completion of all parent stages) of the transmission of data

between stage transitions.

Flow dependency: The data ﬂows generated between consec-

utive stages are inter-dependent, because they usually share

common communication requirements and objectives (Fig-

ure 2). Flow dependency refers to an important concept of

coﬂow [23], which deﬁnes a semantically related collection of

ﬂows. We observe that edges in DAG can be naturally used to

identify coﬂows in DCFs, which provides valuable information

for coﬂow-based optimization mechanisms such as [7, 8].

Calculating ﬂow information with DAG: Inspired by

our observations, we can design a general method to cal-

culate ﬂow information 4-tuple, (source, destination,

flow_size, establish_time) by developing a set of

interfaces to: 1) output stage context

, and to 2) extract

locations and sizes of data partitions.

Flow

Prediction Output Time

establish_time

Flow Start Time Flow End Time

Fig. 3: An example of establish_time

At the high level, the 4-tuple is calculated as follows

(detailed design and implementation in Section III):

• source: we look for the current stages in DAG, and

identify the data partitions that need to be transferred. The

worker node containing the data is the source.

• destination: we look for next stages in DAG, and

identify which worker node will work on which piece of

data. Thus, the destinations of the data can be identiﬁed.

• flow_size: we use the interface to look up sizes of data

partitions to be transmitted.

• establish_time: as depicted in Figure 3, FLOW-

PROPHET outputs prediction information of a ﬂow at the

Prediction Output Time, and the ﬂow begins at the Flow

Start Time. The establish_time is deﬁned as the time

period between the Prediction Output Time and the Flow

Start Time. We develop a heuristic algorithm to estimate the

expected establishing time intervals for subsequent ﬂows.

This algorithm is adaptive to the application and the DCF.

III. FLOWPROPHET DESIGN AND IMPLEMENTATION

We introduce the design and implementation of FLOW-

PROPHET in this section. First, we dissect the ﬂow information

prediction in DCFs into several sub-problems, and describe our

solutions (§ III-A). Then, we present the workﬂow of FLOW-

PROPHET to show how different components work together

Stage context includes current stage, next stage, and the dependency

between them.

…

Flow

Calculator

Data

Aggregator

Spark

Worker

Hadoop

Worker

Ciel

Worker

DAG Builder

…

Write

Data Tracker

Fetch

Master Node

Local

Memory

Local

Disk

Network

Interface

Worker Node

Spark

Master

Hadoop

Master

Ciel

Master

Data

Status

Task List Stage ID

Data Status

List

Fig. 4: The architecture of FLOWPROPHET.

(§ III-B). Finally, we go through the implementation details of

each component of FLOWPROPHET in § III-C.

A. FLOWPROPHET Overview

Figure 4 depicts the architecture of FLOWPROPHET, which

contains 4 modules: DAG Builder, Data Tracker, Data Ag-

gregator, and Flow Calculator (functions explained below).

FLOWPROPHET is attached to DCFs to enable ﬂow pre-

diction. When implementing a general framework to pre-

dict the 4-tuple (source, destination, flow_size,

establish_time) for every upcoming ﬂow in DCFs, we

are essentially solving the following sub-problems:

How to extract the full DAG? The DAG is the pivot for

predicting ﬂow information for DCFs. On the master node of

DCFs, the DAG Builder builds a full DAG by parsing event

messages from the DCF master interfaces.

How to collect data partition status? When a stage is com-

pleted, the computation result is kept as a data partition in local

disk or local memory of each worker node separately. A data

partition status contains the stage_ID, partition_ID and

size. The Data Tracker receives event messages from DCF

worker interfaces and maintains a data structure to record all

data partition status. The Data Aggregator requests the status

of each data partition from the Data Tracker on each worker.

How to be scalable and lightweight? We pursue scalability

and low-overhead in the design of FLOWPROPHET. All mod-

ules in FLOWPROPHET follow the principles of Actor Model

to exchange messages. The Actor Model is an asynchronous

programming model for distributed applications [24]. The

actors are fairly lightweight concurrent entities. They process

messages asynchronously using an event-driven receive loop.

The Actor Model is capable of offering a high level of

abstraction for achieving high concurrency and parallelism.

B. FLOWPROPHET Workﬂow

Figure 5 depicts how modules of FLOWPROPHET coop-

erate to predict upcoming ﬂows when a stage is ﬁnished.

When the DAG Builder receives a message that current stage

is ﬁnished, the DAG Builder checks whether there will be

trafﬁc between the current stage and the next stage. If yes,

the DAG Builder will send the current stage ID to ask the

Data Aggregator to collect data partition status from each

Data Tracker. After the Data Aggregator ﬁnishes the collection,

FLOWPROPHET knows the locations and sizes of all data par-

titions. Then, when DAG Builder is notiﬁed that a new stage is

DAG Builder

Data Aggregator Data Tracker

Flow Calculator

currentStageID

List[DataPartitionStatus]

List[(Location, Size)]

List[task],

List[ParentStageID]

currentStage

Finished

Flow info.

List[partitionID]

List[FailedTaskInfo]

Extra Flow info.

taskFailure

nextStage

Start

Fig. 5: Sequence diagram begins with an event that current

stage is ﬁnished.

beginning, it will send the stage context to the Flow Calculator.

The stage context contains the tasks and parent stage IDs of

the next stage. Each task is identiﬁed by (partition_ID,

executor_ID, func). The Flow Calculator then combines

and matches the task list and stage list with data partition status

list to output the (source, destination, flow_size)

for each ﬂow. Note that task failures will cause corresponding

data partitions to be transmitted again. FLOWPROPHET handles

task failures as follows: Data Trackers receives task failure

events from the DCF worker and notify Flow Calculator of the

extra ﬂow information. Further, the Flow Calculator obtains the

establish_time by a heuristic algorithm.

C. FLOWPROPHET Implementation

We now describe the implementation of the 4 modules of

FlowProphet in detail. We implement FLOWPROPHET with

Scala 2.10.4. We apply the actor model based on Akka 2.3.4

framework [25], which enables each FLOWPROPHET module

to communicate asynchronously and concurrently at low over-

head. Besides, to export DCF intrinsic information, we have

also implemented the APIs for the master and workers of Spark

1.0.0 and Hadoop 0.20.2.

Event Deﬁnition Trigger Condition

newStageEvent(stageID, childStageID) a new stage is created

stageStartEvent(List[task], stageID) a stage is beginning

stageFinishedEvent(stageID) a stage is ﬁnished

TABLE I: The required APIs for DCF master.

DAG Builder: The DAG Builder relies on the information

provided by DCFs to build a full DAG. DCF developers only

need to develop a set of simple interfaces providing primitive

events, which are outlined in Table I. Similar to the DAG

Builder, the Data Tracker also calls for notiﬁcation of events

from the DCF worker.

DAGBuilder Handlers

newStageHandler(newStageEvent) ⇒ (currentStage, childStage)

stageStartHandler(stageStartEvent) ⇒ Event(List[task], List[stageID])

stageFinishedHandler(stageFinishedEvent) ⇒ Event(stageID)

TABLE II: The DAG Builder event handlers.

When a new stage is created in DCF, a newStageEvent

will be raised. The DAG Builder obtains the new stage ID and

its child stage ID. By handlers deﬁned in Table II, the DAG

Builder constructs a full DAG from all the collected pairs of

parent and child stages.

DCFs process stages in a depth-ﬁrst-traversal order, and

trafﬁc does not always take place between two consecutive

stages. To provide accurate prediction, it is necessary to check

the data dependency between current stage and next stage. For

example, in Figure 1 job #n, trafﬁc only happens at following

three moments: after stage 2 and stage 3 both complete, after

stage 5 completes, and after stage 1 and stage 4 both complete.

Furthermore, the stageStartEvent contains a list of

tasks and the stage ID. In each task, the executor_ID is

where the task to be executed; the partition_ID indicates

the data partition that the task will fetch; the func is a set of

nested procedures, which could be executed independently.

Data Aggregator: To manage all the Data Trackers, we place

a Data Aggregator on the master, which organizes partition

status from Data Trackers and exports a query interface for

the Flow Calculator (Table III).

DataAggregator Methods Caller

query(List[partitionID, stageID]) ⇒ List[(location, size)] FlowCalculator

TABLE III: The Data Aggregator API.

When the Data Aggregator receives a stage ID from the

DAG Builder, it will broadcast the stage ID to all the Data

Trackers. Each Data Tracker then replies with a list of data

partition status for the stage ID. Then the Data Aggregator

will build a HashMap to cache these data partition status with

the stage ID as the key. Besides, the Data Aggregator will

append each data partition status with a location ﬁeld, which

is the IP address or hostname of the worker that keeps the data

partition.

In DCFs, there could be thousands of workers or more,

which means that there are the same number of Data Trackers.

Leveraging the Actor Model, all the messages sent from

the Data Trackers actors are placed in the mailbox of the

Data Aggregator actor. Then the Data Aggregator processes

messages in an asynchronous, non-blocking way.

Once the Data Aggregator receives a query request from

the Flow Calculator, it will reply with a list of location and

size for each data partition matching the stage ID.

Data Tracker: Similar with the DAG Builder relying on

primitive information from the DCF master, a Data Tracker

receives and records event messages from the DCF worker.

The event message is deﬁned in Table IV.

Event Deﬁnition Trigger Condition

taskFailureEvent(taskID, stageID, partitionID) a task is failed

taskFinishedEvent(stageID, partitionID, size) a task is ﬁnished

TABLE IV: The required APIs for DCF worker.

The computation takes place on each worker in DCFs,

i.e., the func encapsulated by each task will be extracted

and executed by executors. In general, the computation results

will be written back to the local disk (e.g., Hadoop), or for

high performance, in local memory (e.g., Spark). Besides, most

DCFs designed to be fault-tolerant, and they only attempt re-

execution of failed tasks for limited times. To predict extra

ﬂows generated by tasks re-execution, Data Tracker needs to be

notiﬁed of failed tasks. It is simple to implement the required

APIs by adding less than 50 lines of code in DCF task life-

cycle context.

The Data Tracker constructs a HashMap with the stage

ID as the key, and a list of partition IDs and sizes as the

value. The Data Tracker will update the HashMap when the

taskFinishedEvent is raised by the DCF interface. Then,

when the Data Aggregator requests status of data partitions

of a stage ID, the Data Tracker then replies with a list, in

which each piece of data partition is recorded as stage_ID,

partition_ID and size.

DataTracker Methods Caller

query(stageID) ⇒ List[(stageID, partitionID, size)] DataAggregator

TABLE V: The Data Tracker API.

Flow Calculator: The Flow Calculator is the converging point

of knowledge on time dependency and data dependency, and

it calculates the ﬂow information (source, destination,

flow_size), and estimates the ﬂow establish_time.

Flow information: Once the DAG Builder captures the

stageStartEvent, it will deliver two lists to the Flow

Calculator. One list contains the tasks that are just starting, the

other contains all the parent stage IDs. By traversing the list

of tasks, the Flow Calculator queries the Data Aggregator for

the location and size related to a data partition that each task

will fetch. Thus, the location of data partition is the source,

the executor_ID indicates the destination, and the

size of data partition is the trafﬁc volume flow_size. Since

the predicted ﬂows will not take place until all the tasks on

the master are delivered to the designated workers, the Flow

Calculator will most likely export ﬂow information in advance.

As is shown in our experiments in Section IV, FLOWPROPHET

can predict ﬂow information strictly ahead of time.

Flow establish time: FLOWPROPHET is able to calculate

ﬂow information of the next stage ahead of time. After the

current stage is completed, DCFs usually do a relatively ﬁxed

number of operations to start the next stage, and we refer this

period of time as ﬂow establish_time. For a speciﬁc

application, the establish_time is likely to fall within

a range. This is conﬁrmed by our experiments (Figure 7),

which establish_times all exhibit heavy-tailed distribu-

tion in different DCFs. The majority of establish_times

concentrate in the small range with some occasional outliers

(e.g. network congestion).

However, different conﬁgurations of DCFs and applications

may result in different the establish_times, and it is

difﬁcult to accurately predict for all DCFs and all applica-

tions. Therefore, we introduce an adaptive algorithm to infer

establish_time of ﬂows of different applications.

For an application, the algorithm tracks the average

and variance of establish_time of the previous ﬂows

via the exponentially weighted moving average (EWMA)

method [26]. EWMA has less lag than naive moving average

method, and is more sensitive to recent establish_times,

which ﬁts our goal of tracking current applications. We de-

scribe the estimation method as follows:

Let t

be the expected establish time in the ith stage and σ

the standard deviation. It follows that the establish_time

FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing

Figures

Citations

An Efficient Online Algorithm for Dynamic SDN Controller Assignment in Data Center Networks

Stochastic Configuration Networks Based Adaptive Storage Replica Management for Power Big Data Processing

Multi-resource Load Balancing for Virtual Network Functions

Adaptive scheduling of parallel jobs in spark streaming

Proceedings of the 1983 ACM SIGMOD international conference on Management of data

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

Spark: cluster computing with working sets

Pregel: a system for large-scale graph processing

Dryad: distributed data-parallel programs from sequential building blocks

Related Papers (5)

An Internet traffic analysis method with MapReduce

Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at Runtime

Hedera: dynamic flow scheduling for data center networks

Inside the Social Network's (Datacenter) Network

Coflow: a networking abstraction for cluster applications

Frequently Asked Questions (14)

Q1. What are the contributions in "Flowprophet: generic and accurate traffic prediction for data-parallel cluster computing" ?

Q2. What have the authors stated for future works in "Flowprophet: generic and accurate traffic prediction for data-parallel cluster computing" ?

Q3. How does DAG help to achieve scalable performance?

Q4. What is the way to measure the time in a distributed setting?

Q5. What is the function that is attached to the data tracker?

Q6. What is the function that handles task failures?

Q7. How long does FLOWPROPHET take to predict traffic?

Q8. How does the Flow Calculator predict the flow of a task?

Q9. What is the function of the Flow Calculator?

Q10. What is the purpose of extending the argument lists of FLOWPROPHET?

Q11. What is the description of FLOWPROPHET?

Q12. How many workers are used to complete a job?

Q13. How does FLOWPROPHET achieve the accuracy in source, destination and flow size predictions?

Q14. What is the definition of establish time?