An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination
Summary (7 min read)
Introduction
- As the popularity of big data analytics has continued to grow, so has the need for accessible and scalable machine-learning implementations.
- In recent years, Apache Spark’s machine-learning library, MLlib, has been used to fulfill this need.
- For all four of the applications tested, the speed of Smart-MLlib’s implementation outperformed Spark’s MLlib in every configuration by at least 90%.
- On average, the authors outperformed Spark’s MLlib by over 800%.
Acknowledgments
- I would like to start by thanking my advisor, Dr. Gagan Agrawal, for his support and guidance throughout my career at Ohio State.
- My parents, Adam and Lorna, for their support and advice throughout my college career.
- The trend of data-driven decision making has created a new multi-billion dollar big-data technology and services industry [11].
1.1 Motivation
- In part to demonstrate the support for cyclic dataflow in Spark, its makers implemented a full machine-learning library, coined MLlib, on top of the framework.
- This library not only provides concrete examples of iterative applications in Spark, but also provides an extremely user-friendly API for production-quality, machine-learning algorithms.
- For many of the algorithms, users need less than 20 lines of code to build complex statistical models from stored semi-structured data [24].
- Other distributed-computing frameworks, such as their in-Situ MApReduce liTe system, have been shown to outperform Spark by at least an order of magnitude for some machine-learning tasks [28].
- Building a machine-learning library that mimics Spark’s MLlib on top of Smart could lead to increased performance without compromising the easy-to-use interface.
1.2 Our Contributions
- The authors present a machine-learning library prototype, comparable to Spark’s MLlib, built on top of their Smart system.
- In addition to presenting the specific Smart-MLlib applications in this paper, the authors will also describe the underlying architecture of their system.
- The main focus of their architectural discussion revolves around launching Smart’s native jobs from within Scala’s Java virtual machine (JVM) environment.
- After describing the complications surrounding this issue, the authors explain why utilizing the scala.sys.
- In addition to these performance results, the authors also show that Smart-MLlib scales better than Spark’s MLlib in almost all cases.
1.3 Organization
- This thesis is organized in the following way: Chapter 2 discusses the background and motivation for MapReduce, Spark, Smart, and MLlib.
- Chapter 4 covers each of Smart-MLlib’s algorithms by giving a brief overview, exploring the Smart implementation, presenting the Smart-MLlib API, and comparing the usage of Smart-MLlib to the usage of Spark’s MLlib for the same algorithm.
- Chapter 5 presents and analyzes the experiments that were conducted to compare the libraries.
- Finally, Chapter 6 concludes the thesis with an overview of what was covered and provides topics for further work.
- First, the authors cover the history and basic ideas of the MapReduce programming model.
2.1 MapReduce
- In the early 2000s, distributed computing was already making a large impact on the technology industry [2].
- The ability to leverage clusters of computers to speed up computation was motivating Google and many other companies to implement hundreds of special-purpose distributed applications [4].
- Straightforward algorithms were obfuscated by the details required to build distributed algorithms: explicit parallelization of the computation, intelligent distribution of the data, etc.
- The model provides a very simple API containing only two core 5 functions: map and reduce.
- This simple dataflow, coupled with MapReduce’s straightforward, functional-style API, has made the programming model very popular for a variety of applications [7, 5].
2.2 Spark
- Spark was introduced in 2010 to fulfill the need for a general-purpose, parallelprocessing framework with built-in support for nonlinear dataflows [30].
- In addition to defining these new distributed datasets, Spark defined a series of operations on RDDs that supported parallel computation.
- 7 RDD operations can be loosely grouped into two categories: transformations and actions [25].
- Examples of actions include reduce and collect.
- Finally, an action is performed and the program is terminated.
2.3 Smart
- Smart [28] is the next generation of a parallel-computing framework that has evolved from FREERIDE (FRamework for Rapid Implementation of Data Mining Engines) [13] and MATE (Map-reduce with an AlternaTE API) [12].
- Instead of map and reduce phases of computation, Smart uses reduction and combination phases.
- More specifically, Smart processes data in the following way.
- After every data chunk has been reduced, merge is used to combine all of the reduction maps into a single combination map.
- Two functions not shown are process extra data which initializes the combination map and convert which is used to convert the combination map to an output result.
3.1 System Overview
- At its core, Smart-MLlib is a Scala-based API that is used to execute machinelearning algorithms on Smart.
- Connecting a JVM-based language, like Scala, with a native language, like C++, is not a new problem.
- Smart is not just a C++ library, but a C++ library that uses MPI to distribute work across clusters of nodes.
- The Scala API prepares a mpiexec command complete with all arguments needed by Smart. 15 3. Using scala.sys.process, the Smart job is executed with the mpiexec command prepared in Step 2.
- The Scala API returns the model from Step 7 to the user as a Scala object.
3.1.1 System Advantages
- Architecting the system in the ways outlined above provides several advantages over other possible designs.
- First, calling Smart as an external command, as it would be called without the MLlib wrapper, ensures that all necessary runtime configurations are properly set up.
- Beyond the simplification of launching Smart jobs, the architecture also forces every MLlib algorithm to define a savable model.
- As shown in Step 5 of Figure 3.1, this requirement comes from the Smart executable, which finishes only after saving a model to disk.
- Additionally, since the model is saved to persistent storage, if the JVM process crashes after a Smart job terminates, the machine-learning job doesn’t need to be executed again.
3.1.2 System Disadvantages
- While the system architecture utilized has several benefits, it also introduces a few disadvantages.
- The only real usable information Scala gets directly from Smart is the exit status after the Smart job finishes execution.
- In addition to the loss of control over the execution of Smart jobs, the architecture also brings additional latency into the system.
- Writing and reading from disk is a 17 very slow way to communicate between two processes.
- Fortunately, these failures are infrequent, and many of the issues that do appear can easily be handled in the language in which they occur.
3.2 Smart-MLlib Walkthrough
- In Section 3.1, an overview of the system architecture was presented.
- The main tasks for which these applications are responsible are as follows:.
4.1 K-Means Clustering
- The first algorithm to be covered is k-means clustering (k-means).
- K-means is an unsupervised machine-learning technique used to separate a dataset into k groups such that each group contains similar patterns [23].
- K-means, in particular, typically uses the Euclidian distance between two patterns as the measure of similarity [10].
- The goal of the algorithm is to group all of the patterns into k clusters in such a way 22 that the sum of the squared distance between every data pattern and its assigned cluster center is minimized [23].
- The basic k-means algorithm works as follows: 1) the initial k centers are set; 2) each data point in the dataset is assigned to the nearest center; 3) each center recomputes its location as the mean of all data points assigned to it; and 4) Steps 2 and 3 are repeated until a stopping condition is met.
4.1.1 Smart’s Implementation
- To fully describe Smart’s k-means implementation, both the reduction object and core API functions need to be defined.
- The pseudocode for these items can be seen in Listing 4.1 which was adapted from Smart’s original paper [28].
- Next, merge accumulates all the reduction maps produced by accumulate into a single combination map.
- The final combination map, which holds one ClusterObj for each cluster in the algorithm, is then updated by post combine.
4.1.2 Smart-MLlib Interface
- The k-means API in Smart-MLlib is currently implemented as a single function.
- Table 4.1 gives the API and describes all of the formal parameters that may be specified.
- From examining the table, it is clear that the interface supplies options for declaring the initial k-means model, determining the number of clusters to use (i.e. k), and setting the number of iterations for the algorithm.
- The parameters unused for these tasks provide general information on the data being processed and the environment Smart is using to execute the distributed algorithm.
4.2 Linear Regression
- Linear regression is a technique used to model the relationship between variables [29].
- In the version the authors discuss, the idea is to determine the linear combination of independent variables that will best explain a single dependent variable within a dataset.
- For a more comprehensive explanation of the linear regression algorithm, please refer to Spark’s MLlib guide [24].
4.2.1 Smart’s Implementation
- The Smart implementation for linear regression can be fully described by defining both the reduction object and the core API functions for the algorithm.
- In addition, WeightObj is responsible for accumulating the number of points processed and the sum of weighted errors detected throughout the linear regression program.
- In accumulate, the input pair is processed and both the size and sum weighted error, which is based on the input vector and current model weights, are reduced into the reduction object. merge further accumulates the reduction maps produced by accumulate into a single combination map.
4.2.2 Smart-MLlib Interface
- The Smart-MLlib linear regression API is shown in Table 4.2.
- The interface provides options for altering both the learning rate of the algorithm as well as the number of iterations the algorithm should perform.
- In addition to these parameters, the API also allows users to specify an initial model and gives an option for including a bias or intercept term in the linear-regression computation.
- The remaining parameters provide relevant information about the data being processed and the cluster environment Smart is using to execute the distributed algorithm.
4.3 Gaussian Mixture Model
- A Gaussian mixture model (GMM) is a probability distribution that is constructed using a weighted combination of k Gaussian functions.
- The aim of training a GMM is to modify the weights (i.e. linear coefficients), means, and covariance matrices of the Gaussian functions in order to maximize the likelihood that a particular dataset could be generated by the mixture model [19].
- Typically, a GMM is trained through the utilization of the expectation-maximization (EM) algorithm [19, 18].
- Within the context of GMM, the EM algorithm works as follows: 1) the initial k Gaussians are selected; 2) the responsibility of each Gaussian to every data point is determined; 3) based on the responsibilities computed in the previous step, the Gaussian weights, means, and covariance matrices are updated; and 4) Steps 2 and 3 are repeated until convergence.
- For a more comprehensive explanation of the GMM algorithm, please refer to Spark’s MLlib guide [24].
4.3.1 Smart’s Implementation
- Smart’s GMM implementation can be described through defining the algorithm’s reduction object and core API functions.
- Since this algorithm only requires a single reduction object, gen key simply returns a constant number.
4.3.2 Smart-MLlib Interface
- Since the Gaussian mixture model algorithm is similar in flavor to the k-means algorithm, the Smart-MLlib interface for these applications is almost identical.
- As Table 4.3 shows, the GMM interface provides options for specifying the initial model, the number of Gaussians to use, and the number of iterations for the algorithm to complete.
- The other parameters are used to specify both the data being processed and the system on which the Smart job will be running.
4.4 Support Vector Machine
- Support vector machines (SVMs) are used to classify data into two groups.
- Unlike various other types of classifiers that do not make determinations on the “goodness” of a classification (e.g. perceptrons [20]), SVMs attempt to optimally classify datasets [10].
- The classification is defined with a hyperplane that separates the dataset into two classes.
- For a more comprehensive explanation of the SVM algorithm, please refer to Spark’s MLlib guide [24].
4.4.1 Smart’s Implementation
- The Smart implementation for SVM can be described by defining the algorithm’s reduction object and core API functions.
- As this algorithm requires only one reduction object per reduction map, gen key always returns a constant number.
- Using this master combination map, post combine performs a gradient-descent update on the weights of the model using all of the accumulated values.
4.4.2 Smart-MLlib Interface
- As Table 4.4 suggests, the Smart-MLlib SVM API provides options for specifying the initial SVM model, setting the number of iterations, and declaring if the SVM model being produced should include a bias term.
- In addition to these parameters, 40 the interface also gives users the opportunity to alter the algorithm’s learning rule through the learningRate and regParam parameters.
- As with the other Smart-MLlib interfaces discussed, the remaining parameters exist to provide information on the data being processed and the system being used to execute the SVM algorithm.
5.1 Environment
- The authors experiments were all conducted on the same homogeneous, multi-core computing cluster.
- Specifically, their tests were performed using 4, 8, 16, and 32 nodes configurations.
- The processors each have a combined total of eight computing cores that run at a base frequency of 2.53GHz.
- The Spark experiments, on the other hand, used Spark’s standalone cluster for communication.
- This translates to Smart using one MPI process and eight OpenMP threads per node and Spark using one executor and eight executor cores per node.
5.2 K-Means Clustering Experiments
- For the k-means experiments, the performance of the basic k-means implementation was compared between Smart-MLlib and Spark’s MLlib.
- The performance reported for each configuration is an average of five independent trials.
- In all of the tests, k-means was run with four cluster centers for exactly 1000 iterations on 16-dimensional input.
- Since Spark will stop iterating when a default convergence condition is met, the source code was modified to ensure all iterations actually occurred.
5.2.1 Results
- The results of all the k-means experiments can be seen in Figure 5.1.
- In addition to outperforming Spark in head-to-head experiments, Smart-MLlib also out-scaled Spark’s MLlib.
- Figure 5.2 shows this by tracking Smart’s speedup over Spark while increasing the number of nodes.
- As these results are consistent with all other results within this chapter, please refer to Section 5.6 for a detailed analysis.
5.3 Linear Regression Experiments
- For the linear regression experiments, the performance of the linear regression implementation on Smart-MLlib and Spark’s MLlib were compared.
- As in Section 5.2, the results reported for each configuration are an average of five independent trials.
- In all of the tests, the linear regression processed input with 15 dimensions and 1 output dimension and ran for exactly 1000 iterations.
- Note that the Spark source code had to be modified to guarantee all 1000 iterations of the algorithm were completed.
5.3.1 Results
- The results of the linear regression experiments can be seen in Figure 5.3.
- From examining the graphs, it is clear that the Smart-MLlib implementation outperforms Spark’s in every configuration.
- As with k-means, the Smart-MLlib version also scales better than Spark’s.
- Figure 5.4 shows this superior scaling graphically.
- As these results are consistent with all other results in this chapter, please refer to Section 5.6 for a detailed analysis.
5.4 Gaussian Mixture Model Experiments
- The Gaussian mixture model (GMM) experiments were conducted to compare the performance of GMM on Smart-MLlib and Spark’s MLlib.
- The results reported for each configuration are the average of five independent tests.
- Since GMM takes substantially longer to execute than the other algorithms covered, each trial was only run for 100 iterations using a four Gaussian 49 model.
5.4.1 Results
- The Gaussian mixture model results can be seen in Figure 5.5.
- Results from k-means, linear regression, and SVM show Smart having roughly a 2 to 15 times advantage over Smart; however, for the GMM tests, this range balloons to a 13 to 54 times advantage.
- In all other algorithms presented, the Smart-MLlib implementation scales better than Spark’s in all cases.
- In Figure 5.6, the authors see a single negatively sloping line segment for both input sizes plotted.
- The authors suspect that this decrease in input size allowed each Spark executor to cache one or more additional RDDs resulting in significantly improved performance.
5.5 SVM Experiments
- For the SVM experiments, the performance of the linear SVM implementation is compared between Smart-MLlib and Spark’s MLlib.
- Each test ran for exactly 1000 iterations on samples with 15 input dimensions and 1 output dimension.
- Figure 5.8 shows this graphically through the positively sloped line segments.
- As the number of nodes increases, Smart-MLlib’s SVM performance gets better relative to Spark’s MLlib.
5.6 Analysis and Discussion
- All of the results presented in this chapter show that, for the algorithms discussed, Smart-MLlib performs strictly better than Spark’s MLlib.
- In every configuration tested, the Smart implementation performed at least 90% times better than the Spark implementation.
- The performance advantages of Smart result from three key differences between Smart and Spark [28].
- Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica.
Did you find this useful? Give us your feedback
Citations
804 citations
770 citations
693 citations
595 citations
572 citations
References
11,658 citations
9,715 citations
8,307 citations
8,233 citations
7,365 citations
Related Papers (5)
Frequently Asked Questions (16)
Q2. What are the common methods used to show consensus in multi-agent systems?
In order to show consensus in multi-agent systems with time-varying network structures, stochastic matrix theory [5]–[7], [10] and convexity analysis [11] are often applied.
Q3. What are some of the potential applications of distributed surveillance?
Distributed surveillance has a number of potential applications, such as board security guarding, forest fire monitoring, and oil spill patrolling.
Q4. How can the authors solve flocking with a dynamic group reference?
If enough information of the group reference is known, such as acceleration and/or velocity information of the group reference, flocking with a dynamic group reference can be solved by employing a gradient-based control law [203]–[205].
Q5. What is the meaning of distributed task assignment?
Distributed task assignment refers to the study of task assignment of a group of dynamical agents in a distributed manner, which can be roughly categorized into coverage control, scheduling, and surveillance.
Q6. What is the main reason for the study of consensus under various system dynamics?
Although the study of consensus under various system dynamics is due to the existence of complex dynamics in practical systems, it is also interesting to observe that system dynamics play an important role in determining the final consensus state.
Q7. What is the purpose of the formation tracking problem?
An interesting problem in formation tracking is to design a distributed control algorithm to drive a team of agents to track some desired state.
Q8. How can the authors convert a formation tracking problem to a traditional stability problem?
The formation tracking problem can be converted to a traditional stability problem by redefining the variables as the errors between each agent’s state and the group reference.
Q9. Why is it important to study the effects of time delay on the stability of consensus?
Because time delay might affect the system stability, it is important to study under what conditions consensus can still be guaranteed even if time delay exists.
Q10. What is the consensus problem in the deterministic setting?
Some other times, when considering random communication failures, random packet drops, communication channel instabilities inherited in physical communication channels, etc., it is necessary and important to study consensus problem in the stochastic setting where a network topology evolves according to some random distributions.
Q11. What is the main research question for consensus in a sampled-data framework?
Note that the existing research on consensus in a sampled-data framework mainly focuses on the simple system dynamics and thus the closed-loop system can be represented in terms of a linear matrix equation.
Q12. What is the main approach to maintaining the connectivity of a team of agents?
The main approach to maintaining the connectivity of a team of agents is to define some artificial potentials (between any pair of agents) in a proper way such that if two agents are neighbors initially then they will always communicate with each other thereafter [206], [219]–[228].
Q13. What is the common approach to solve the optimization problem for a ring type of network?
In [233], an incremental subgradient approach was used to solve the optimizationJuly 31, 2011 DRAFT26problem for a ring type of network.
Q14. Why is formation tracking more difficult than the group reference?
Due to the existence of the group reference, formation tracking is usually much more challenging than formation producing and control algorithms for the latter might not be useful for the former.
Q15. What are the main advantages of the distributed approach?
Although both approaches are considered practical depending on the situations and conditions of the real applications, the distributed approach is believed more promising due to many inevitable physical constraints such as limited resources and energy, short wireless communication ranges, narrow bandwidths, and large sizes of vehicles to manage and control.
Q16. What is the difficult problem to incorporate into a unified methodology?
it remains a challenging problem to incorporate both dynamics of consensus and probabilistic filtering (Kalman) into a unified methodology.