FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing
read more
Citations
An Efficient Online Algorithm for Dynamic SDN Controller Assignment in Data Center Networks
Stochastic Configuration Networks Based Adaptive Storage Replica Management for Power Big Data Processing
Multi-resource Load Balancing for Virtual Network Functions
Adaptive scheduling of parallel jobs in spark streaming
Proceedings of the 1983 ACM SIGMOD international conference on Management of data
References
MapReduce: simplified data processing on large clusters
MapReduce: simplified data processing on large clusters
Spark: cluster computing with working sets
Pregel: a system for large-scale graph processing
Dryad: distributed data-parallel programs from sequential building blocks
Related Papers (5)
Frequently Asked Questions (14)
Q2. What have the authors stated for future works in "Flowprophet: generic and accurate traffic prediction for data-parallel cluster computing" ?
The authors make sure that the application programming interfaces ( APIs ) of FLOWPROPHET is general, so that existing and future computing frameworks can readily deploy FLOWPROPHET to generate accurate flow predictions. The authors also show that simple network optimizations with aheadof-time flow predictions can provide substantial improvement in application performance.
Q3. How does DAG help to achieve scalable performance?
To achieve scalable performance, DCFs automatically discover and exploit parallelism from user’s application logic, and distribute parallel computational tasks to every computing node.
Q4. What is the way to measure the time in a distributed setting?
For accurate time measurement in a distributed setting, the authors deploy NTP [28] on the master node and worker nodes to synchronize system clock.
Q5. What is the function that is attached to the data tracker?
The Data Tracker receives event messages from DCF worker interfaces and maintains a data structure to record all data partition status.
Q6. What is the function that handles task failures?
FLOWPROPHET handles task failures as follows: Data Trackers receives task failure events from the DCF worker and notify Flow Calculator of the extra flow information.
Q7. How long does FLOWPROPHET take to predict traffic?
In addition, as Hadoop spends much more timeto read and write data from disk while Spark visits data in memory directly, FLOWPROPHET manages to achieve larger lead time on Hadoop.
Q8. How does the Flow Calculator predict the flow of a task?
Since the predicted flows will not take place until all the tasks on the master are delivered to the designated workers, the Flow Calculator will most likely export flow information in advance.
Q9. What is the function of the Flow Calculator?
The Flow Calculator then combines and matches the task list and stage list with data partition status list to output the (source, destination, flow_size) for each flow.
Q10. What is the purpose of extending the argument lists of FLOWPROPHET?
To enable FLOWPROPHET in a multi-tenant cluster, the authors plan to extend the argument lists of FLOWPROPHET APIs with user IDs (stages, jobs, tasks, and flows will be tagged with a user ID).
Q11. What is the description of FLOWPROPHET?
FLOWPROPHET offers simple, flexible, and fine grained interfaces to predict flow information, and they are able to adapt to a wide range of scenarios.
Q12. How many workers are used to complete a job?
In Figure 14 and Figure 15, as the number of workers increases, more parallel computing resources are utilized and therefore the job completion time decreases gradually.
Q13. How does FLOWPROPHET achieve the accuracy in source, destination and flow size predictions?
The authors conclude that FLOWPROPHET achieves high (almost 100%) accuracy in source, destination and flow size predictions for both Spark and Hadoop.
Q14. What is the definition of establish time?
After the current stage is completed, DCFs usually do a relatively fixed number of operations to start the next stage, and the authors refer this period of time as flow establish_time.