Spark: cluster computing with working sets
Citations
17,433 citations
Cites background from "Spark: cluster computing with worki..."
...There has also been some recent work on alternative MapReduce systems that are specifically designed for iterative computation, which are likely better suited for ADMM [25, 179], though the implementations are less mature and less widely available....
[...]
5,198 citations
2,141 citations
Cites methods from "Spark: cluster computing with worki..."
...nodes and stores configuration information [82] Spark A fast and general computation engine for Hadoop data [83] Chukwa Chukwa has just passed its development stage; it is a data collection and analysis framework incorporated with MapReduce and HDFS; the workflow of Chukwa allows for data collection from distributed systems, data processing, and data storage in Hadoop; as an independent module, Chukwa is included in the Apache Hadoop distribution [76] Twister Provides support for iterative MapReduce computations and Twister; extremely faster than Hadoop MAPR Comprehensive distribution processing for Apache Hadoop and Hbase...
[...]
2,006 citations
Cites background or methods from "Spark: cluster computing with worki..."
...Examples of alternative programming models that are becoming available on YARN are: Dryad [18], Giraph, Hoya, REEF [10], Spark [32], Storm [4] and Tez [2]....
[...]
...Spark is an open-source research project from UC Berkeley [32], that targets machine learning and interactive querying workloads....
[...]
...If one composes this flow as a sequence of MapReduce jobs, the scheduling overhead will significantly delay the result [32]....
[...]
1,786 citations
Cites methods from "Spark: cluster computing with worki..."
...For example, at time 350, when both Spark and the Facebook Hadoop framework have no running jobs and Torque is using 1/8 of the cluster, the large-job Hadoop framework scales up to 7/8 of the cluster....
[...]
...We evaluated the benefit of running iterative jobs using the specialized Spark framework we developed on top of Mesos (Section 5.3) over the general-purpose Hadoop framework....
[...]
...To validate our hypothesis that specialized frameworks provide value over general ones, we have also built a new framework on top of Mesos called Spark, optimized for iterative jobs where a dataset is reused in many parallel oper- ations, and shown that Spark can outperform Hadoop by 10x in iterative machine learning workloads....
[...]
...The longer time for the first iteration in Spark is due to the use of slower text parsing routines....
[...]
...• Spark running a series of machine learning jobs....
[...]
References
20,309 citations
17,663 citations
3,355 citations
2,867 citations
"Spark: cluster computing with worki..." refers methods in this paper
...MapReduce [11] pioneered this model, while systems like Dryad [16] and Map-ReduceMerge [23] generalized the types of data flows supported....
[...]
2,670 citations