scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 2019"


Journal ArticleDOI
01 Feb 2019
TL;DR: A new nature-inspired algorithm, namely butterfly optimization algorithm (BOA) that mimics food search and mating behavior of butterflies, to solve global optimization problems and results indicate that the proposed BOA is more efficient than other metaheuristic algorithms.
Abstract: Real-world problems are complex as they are multidimensional and multimodal in nature that encourages computer scientists to develop better and efficient problem-solving methods. Nature-inspired metaheuristics have shown better performances than that of traditional approaches. Till date, researchers have presented and experimented with various nature-inspired metaheuristic algorithms to handle various search problems. This paper introduces a new nature-inspired algorithm, namely butterfly optimization algorithm (BOA) that mimics food search and mating behavior of butterflies, to solve global optimization problems. The framework is mainly based on the foraging strategy of butterflies, which utilize their sense of smell to determine the location of nectar or mating partner. In this paper, the proposed algorithm is tested and validated on a set of 30 benchmark test functions and its performance is compared with other metaheuristic algorithms. BOA is also employed to solve three classical engineering problems (spring design, welded beam design, and gear train design). Results indicate that the proposed BOA is more efficient than other metaheuristic algorithms.

865 citations


Journal ArticleDOI
TL;DR: This survey provides a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors, and lists the traditional and new applications.
Abstract: Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in people's life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning algorithms for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline thoroughly and deeply, in this survey, we analyze the methods of existing typical detection models and describe the benchmark datasets at first. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.

749 citations


Proceedings Article
28 Mar 2019
TL;DR: This paper standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications, and proposes a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations.
Abstract: In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize.

736 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: LaSOT is presented, a high-quality benchmark for Large-scale Single Object Tracking that consists of 1,400 sequences with more than 3.5M frames in total, and is the largest, to the best of the authors' knowledge, densely annotated tracking benchmark.
Abstract: In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking. LaSOT consists of 1,400 sequences with more than 3.5M frames in total. Each frame in these sequences is carefully and manually annotated with a bounding box, making LaSOT the largest, to the best of our knowledge, densely annotated tracking benchmark. The average video length of LaSOT is more than 2,500 frames, and each sequence comprises various challenges deriving from the wild where target objects may disappear and re-appear again in the view. By releasing LaSOT, we expect to provide the community with a large-scale dedicated benchmark with high quality for both the training of deep trackers and the veritable evaluation of tracking algorithms. Moreover, considering the close connections of visual appearance and natural language, we enrich LaSOT by providing additional language specification, aiming at encouraging the exploration of natural linguistic feature for tracking. A thorough experimental evaluation of 35 tracking algorithms on LaSOT is presented with detailed analysis, and the results demonstrate that there is still a big room for improvements.

653 citations


Proceedings ArticleDOI
25 Jul 2019
TL;DR: In this article, the authors propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search, which keeps the functionality of a neural network while changing its neural architecture, enabling more efficient training during the search.
Abstract: Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling more efficient training during the search. In this paper, we propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search. The framework develops a neural network kernel and a tree-structured acquisition function optimization algorithm to efficiently explores the search space. Extensive experiments on real-world benchmark datasets have been done to demonstrate the superior performance of the developed framework over the state-of-the-art methods. Moreover, we build an open-source AutoML system based on our method, namely Auto-Keras. The code and documentation are available at https://autokeras.com. The system runs in parallel on CPU and GPU, with an adaptive search strategy for different GPU memory limits.

563 citations


24 Oct 2019
TL;DR: An open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks is proposed to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks.

487 citations


Posted Content
TL;DR: PIXOR as discussed by the authors is a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions, which is designed to balance high accuracy and real-time efficiency.
Abstract: We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. Computation speed is critical as detection is a necessary component for safety. Existing approaches are, however, expensive in computation due to high dimensionality of point clouds. We utilize the 3D data more efficiently by representing the scene from the Bird's Eye View (BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions. The input representation, network architecture, and model optimization are especially designed to balance high accuracy and real-time efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets we show that the proposed detector surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at >28 FPS.

329 citations


Proceedings ArticleDOI
20 Mar 2019
TL;DR: benchmark results show that this approach has significantly lower runtime than other recent detectors and that it achieves state-of-the-art performance when compared on a large dataset that has enough data to overcome the challenges of training on the range view.
Abstract: In this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving. The efficiency results from processing LiDAR data in the native range view of the sensor, where the input data is naturally compact. Operating in the range view involves well known challenges for learning, including occlusion and scale variation, but it also provides contextual information based on how the sensor data was captured. Our approach uses a fully convolutional network to predict a multimodal distribution over 3D boxes for each point and then it efficiently fuses these distributions to generate a prediction for each object. Experiments show that modeling each detection as a distribution rather than a single deterministic box leads to better overall detection performance. Benchmark results show that this approach has significantly lower runtime than other recent detectors and that it achieves state-of-the-art performance when compared on a large dataset that has enough data to overcome the challenges of training on the range view.

321 citations


Journal ArticleDOI
TL;DR: This paper investigates the resource allocation algorithm design for multicarrier solar-powered unmanned aerial vehicle (UAV) communication systems and proposes a low-complexity iterative suboptimal online scheme based on the successive convex approximation.
Abstract: In this paper, we investigate the resource allocation algorithm design for multicarrier solar-powered unmanned aerial vehicle (UAV) communication systems. In particular, the UAV is powered by the solar energy enabling sustainable communication services to multiple ground users. We study the joint design of the 3D aerial trajectory and the wireless resource allocation for maximization of the system sum throughput over a given time period. As a performance benchmark, we first consider an off-line resource allocation design assuming non-causal knowledge of the channel gains. The algorithm design is formulated as a mixed-integer non-convex optimization problem taking into account the aerodynamic power consumption, solar energy harvesting, a finite energy storage capacity, and the quality-of-service requirements of the users. Despite the non-convexity of the optimization problem, we solve it optimally by applying monotonic optimization to obtain the optimal 3D-trajectory and the optimal power and subcarrier allocation policy. Subsequently, we focus on the online algorithm design that only requires real-time and statistical knowledge of the channel gains. The optimal online resource allocation algorithm is motivated by the off-line scheme and entails a high computational complexity. Hence, we also propose a low-complexity iterative suboptimal online scheme based on the successive convex approximation. Our simulation results reveal that both the proposed online schemes closely approach the performance of the benchmark off-line scheme and substantially outperform two baseline schemes. Furthermore, our results unveil the tradeoff between solar energy harvesting and power-efficient communication. In particular, the solar-powered UAV first climbs up to a high altitude to harvest a sufficient amount of solar energy and then descends again to a lower altitude to reduce the path loss of the communication links to the users it serves.

273 citations


Proceedings ArticleDOI
03 May 2019
TL;DR: It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.
Abstract: Self-supervised learning aims to learn representations from the data itself without explicit manual supervision. Existing efforts ignore a crucial aspect of self-supervised learning - the ability to scale to large amount of data because self-supervision requires no manual labels. In this work, we revisit this principle and scale two popular self-supervised approaches to 100 million images. We show that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation (3D) and visual navigation using reinforcement learning. Scaling these methods also provides many interesting insights into the limitations of current self-supervised techniques and evaluations. We conclude that current self-supervised methods are not 'hard' enough to take full advantage of large scale data and do not seem to learn effective high level semantic representations. We also introduce an extensive benchmark across 9 different datasets and tasks. We believe that such a benchmark along with comparable evaluation settings is necessary to make meaningful progress. Code is at: https://github.com/facebookresearch/fair_self_supervision_benchmark.

264 citations


Journal ArticleDOI
01 Apr 2019
TL;DR: The memEAPF proposal consists of delimited compartments where multisets of parameters evolve according to rules of biochemical inspiration to minimize the path length, and it exhibits a better performance regarding path length.
Abstract: In this paper, a membrane evolutionary artificial potential field (memEAPF) approach for solving the mobile robot path planning problem is proposed, which combines membrane computing with a genetic algorithm (membrane-inspired evolutionary algorithm with one-level membrane structure) and the artificial potential field method to find the parameters to generate a feasible and safe path. The memEAPF proposal consists of delimited compartments where multisets of parameters evolve according to rules of biochemical inspiration to minimize the path length. The proposed approach is compared with artificial potential field based path planning methods concerning to their planning performance on a set of twelve benchmark test environments, and it exhibits a better performance regarding path length. Experiments to demonstrate the statistical significance of the improvements achieved by the proposed approach in static and dynamic environments are shown. Moreover, the implementation results using parallel architectures proved the effectiveness and practicality of the proposal to obtain solutions in considerably less time.

Journal ArticleDOI
TL;DR: A hybrid optimization method is presented for the FS problem; it combines the slap swarm algorithm (SSA) with the particle swarm optimization (SSAPSO) to create an algorithm called SSAPSO, in which the efficacy of the exploration and the exploitation steps is improved.
Abstract: Feature selection (FS) is a machine learning process commonly used to reduce the high dimensionality problems of datasets This task permits to extract the most representative information of high sized pools of data, reducing the computational effort in other tasks as classification This article presents a hybrid optimization method for the FS problem; it combines the slap swarm algorithm (SSA) with the particle swarm optimization The hybridization between both approaches creates an algorithm called SSAPSO, in which the efficacy of the exploration and the exploitation steps is improved To verify the performance of the proposed algorithm, it is tested over two experimental series, in the first one, it is compared with other similar approaches using benchmark functions Meanwhile, in the second set of experiments, the SSAPSO is used to determine the best set of features using different UCI datasets Where the redundant or the confusing features are removed from the original dataset while keeping or yielding a better accuracy The experimental results provide the evidence of the enhancement in the SSAPSO regarding the performance and the accuracy without affecting the computational effort

Journal ArticleDOI
TL;DR: A reproducible, cloud-based pipeline is applied to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes to demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive.
Abstract: Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.

Posted Content
TL;DR: In this article, the authors show that by scaling on various axes (including data size and problem 'hardness'), self-supervised learning can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation (3D) and visual navigation using reinforcement learning.
Abstract: Self-supervised learning aims to learn representations from the data itself without explicit manual supervision. Existing efforts ignore a crucial aspect of self-supervised learning - the ability to scale to large amount of data because self-supervision requires no manual labels. In this work, we revisit this principle and scale two popular self-supervised approaches to 100 million images. We show that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation (3D) and visual navigation using reinforcement learning. Scaling these methods also provides many interesting insights into the limitations of current self-supervised techniques and evaluations. We conclude that current self-supervised methods are not 'hard' enough to take full advantage of large scale data and do not seem to learn effective high level semantic representations. We also introduce an extensive benchmark across 9 different datasets and tasks. We believe that such a benchmark along with comparable evaluation settings is necessary to make meaningful progress. Code is at: this https URL.

Journal ArticleDOI
TL;DR: A novel distributed filter is first constructed to practically reflect the impact from both cyber-attacks and gain perturbations and an upper bound of filtering error covariance is derived by resorting to some typical matrix inequalities.
Abstract: This paper addresses the distributed resilient filtering problem for a class of power systems subject to denial-of-service (DoS) attacks. A novel distributed filter is first constructed to practically reflect the impact from both cyber-attacks and gain perturbations. For all possible occurrence of DoS attacks and gain perturbations, an upper bound of filtering error covariance is derived by resorting to some typical matrix inequalities. Furthermore, the desired filter gain relying on the solution of two Riccati-like difference equations is obtained with the help of the gradient-based approach and the mathematical induction. The developed algorithm with a recursive form is independent of the global information and thus satisfies the requirements of scalability and distributed implementation online. Finally, a benchmark simulation test is exploited to check the usefulness of the designed filter.

Proceedings Article
03 Jul 2019
TL;DR: This paper explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.
Abstract: Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or `projected attention layers', we match the performance of separately fine-tuned models on the GLUE benchmark with roughly 7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.

Journal ArticleDOI
TL;DR: In this article, the authors identify hardware that is optimal to produce molecular dynamics (MD) trajectories on Linux compute clusters with the GROMACS 2018 simulation package, and benchmark the performance of these hardware.
Abstract: We identify hardware that is optimal to produce molecular dynamics (MD) trajectories on Linux compute clusters with the GROMACS 2018 simulation package. Therefore, we benchmark the GROMACS performa ...

Journal ArticleDOI
Harish Garg1
TL;DR: A new hybrid GSA-GA algorithm is presented for the constraint nonlinear optimization problems with mixed variables that is tuned up with the gravitational search algorithm and each solution is upgraded with the genetic operators such as selection, crossover, mutation.

Journal ArticleDOI
TL;DR: It is demonstrated that for specific benchmark settings and a selected range of problems, the accuracy metric can reach chemical accuracy when computing over the cloud on certain quantum computers.
Abstract: We present a quantum chemistry benchmark for noisy intermediate-scale quantum computers that leverages the variational quantum eigensolver, active-space reduction, a reduced unitary coupled cluster ansatz, and reduced density purification as error mitigation. We demonstrate this benchmark using 4 of the available qubits on the 20-qubit IBM Tokyo and 16-qubit Rigetti Aspen processors via the simulation of alkali metal hydrides (NaH, KH, RbH), with accuracy of the computed ground state energy serving as the primary benchmark metric. We further parameterize this benchmark suite on the trial circuit type, the level of symmetry reduction, and error mitigation strategies. Our results demonstrate the characteristically high noise level present in near-term superconducting hardware, but provide a relevant baseline for future improvement of the underlying hardware, and a means for comparison across near-term hardware types. We also demonstrate how to reduce the noise in post processing with specific error mitigation techniques. Particularly, the adaptation of McWeeny purification of noisy density matrices dramatically improves accuracy of quantum computations, which, along with adjustable active space, significantly extends the range of accessible molecular systems. We demonstrate that for specific benchmark settings and a selected range of problems, the accuracy metric can reach chemical accuracy when computing over the cloud on certain quantum computers.

Journal ArticleDOI
Chenglong Li1, Xinyan Liang1, Yijuan Lu2, Nan Zhao1, Jin Tang1 
TL;DR: A novel graph-based approach to learn a robust object representation for RGB-T tracking is proposed, in which the tracked object is represented with a graph with image patches as nodes, dynamically learned in a unified ADMM (alternating direction method of multipliers)-based optimization framework.

Journal ArticleDOI
TL;DR: Ntakaris et al. as discussed by the authors developed a large-scale deep learning model to predict price movements from limit order book (LOB) data of cash equities, which utilizes convolutional filters to capture the spatial structure of the LOBs as well as long short-term memory modules to capture longer time dependencies.
Abstract: We develop a large-scale deep learning model to predict price movements from limit order book (LOB) data of cash equities. The architecture utilizes convolutional filters to capture the spatial structure of the LOBs as well as long short-term memory modules to capture longer time dependencies. The proposed network outperforms all existing state-of-the-art algorithms on the benchmark LOB dataset [A. Ntakaris, M. Magris, J. Kanniainen, M. Gabbouj, and A. Iosifidis, “Benchmark dataset for mid-price prediction of limit order book data with machine learning methods,” J. Forecasting , vol. 37, no. 8, 852–866, 2018]. In a more realistic setting, we test our model by using one-year market quotes from the London Stock Exchange, and the model delivers a remarkably stable out-of-sample prediction accuracy for a variety of instruments. Importantly, our model translates well to instruments that were not part of the training set, indicating the model's ability to extract universal features. In order to better understand these features and to go beyond a “black box” model, we perform a sensitivity analysis to understand the rationale behind the model predictions and reveal the components of LOBs that are most relevant. The ability to extract robust features that translate well to other instruments is an important property of our model, which has many other applications.

Posted Content
TL;DR: This work co-designs the switch processing with the end-host protocols and ML frameworks to provide a robust, efficient solution that speeds up training by up to 300%, and at least by 20% for a number of real-world benchmark models.
Abstract: Training machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the training process. Our approach, SwitchML, reduces the volume of exchanged data by aggregating the model updates from multiple workers in the network. We co-design the switch processing with the end-host protocols and ML frameworks to provide an efficient solution that speeds up training by up to 5.5$\times$ for a number of real-world benchmark models.

Posted Content
TL;DR: This work shows that this challenging learning problem can be simplified by decomposing it into two stages and uses the presented approach to train a vision-based autonomous driving system that substantially outperforms the state of the art on the CARLA benchmark and the recent NoCrash benchmark.
Abstract: Vision-based urban driving is hard. The autonomous system needs to learn to perceive the world and act in it. We show that this challenging learning problem can be simplified by decomposing it into two stages. We first train an agent that has access to privileged information. This privileged agent cheats by observing the ground-truth layout of the environment and the positions of all traffic participants. In the second stage, the privileged agent acts as a teacher that trains a purely vision-based sensorimotor agent. The resulting sensorimotor agent does not have access to any privileged information and does not cheat. This two-stage training procedure is counter-intuitive at first, but has a number of important advantages that we analyze and empirically demonstrate. We use the presented approach to train a vision-based autonomous driving system that substantially outperforms the state of the art on the CARLA benchmark and the recent NoCrash benchmark. Our approach achieves, for the first time, 100% success rate on all tasks in the original CARLA benchmark, sets a new record on the NoCrash benchmark, and reduces the frequency of infractions by an order of magnitude compared to the prior state of the art. For the video that summarizes this work, see this https URL

Journal ArticleDOI
TL;DR: A quantum circuit learning algorithm that can be used to assist the characterization of quantum devices and to train shallow circuits for generative tasks is proposed and it is demonstrated that this approach can learn an optimal preparation of the Greenberger-Horne-Zeilinger states.
Abstract: Hybrid quantum-classical algorithms provide ways to use noisy intermediate-scale quantum computers for practical applications. Expanding the portfolio of such techniques, we propose a quantum circuit learning algorithm that can be used to assist the characterization of quantum devices and to train shallow circuits for generative tasks. The procedure leverages quantum hardware capabilities to its fullest extent by using native gates and their qubit connectivity. We demonstrate that our approach can learn an optimal preparation of the Greenberger-Horne-Zeilinger states, also known as “cat states”. We further demonstrate that our approach can efficiently prepare approximate representations of coherent thermal states, wave functions that encode Boltzmann probabilities in their amplitudes. Finally, complementing proposals to characterize the power or usefulness of near-term quantum devices, such as IBM’s quantum volume, we provide a new hardware-independent metric called the qBAS score. It is based on the performance yield in a specific sampling task on one of the canonical machine learning data sets known as Bars and Stripes. We show how entanglement is a key ingredient in encoding the patterns of this data set; an ideal benchmark for testing hardware starting at four qubits and up. We provide experimental results and evaluation of this metric to probe the trade off between several architectural circuit designs and circuit depths on an ion-trap quantum computer.

Journal ArticleDOI
TL;DR: This paper proposes heterogeneous coded matrix multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that are provably asymptotically optimal for a broad class of processing time distributions and develops a heuristic algorithm for HCMM load allocation for the distributed implementation of budget-limited computation tasks.
Abstract: In large-scale distributed computing clusters, such as Amazon EC2, there are several types of “system noise” that can result in major degradation of performance: system failures, bottlenecks due to limited communication bandwidth, latency due to straggler nodes, and so on. There have been recent results that demonstrate the impact of coding for efficient utilization of computation and storage redundancy to alleviate the effect of stragglers and communication bottlenecks in homogeneous clusters. In this paper, we focus on general heterogeneous distributed computing clusters consist of a variety of computing machines with different capabilities. We propose a coding framework for speeding up distributed computing in heterogeneous clusters by trading redundancy for reducing the latency of computation. In particular, we propose heterogeneous coded matrix multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that are provably asymptotically optimal for a broad class of processing time distributions. Moreover, we show that HCMM is unboundedly faster than any uncoded scheme that partitions the total workload among the workers. To demonstrate how the proposed HCMM scheme can be applied in practice, we provide results from numerical studies and Amazon EC2 experiments comparing HCMM with three benchmark load allocation schemes—uniform uncoded, load-balanced uncoded, and uniform coded. In particular, in our numerical studies, HCMM achieves speedups of up to 73%, 56%, and 42%, respectively, over the three benchmark schemes mentioned earlier. Furthermore, we carry out experiments over Amazon EC2 clusters and demonstrate how HCMM can be combined with rateless codes with nearly linear decoding complexity. In particular, we show that HCMM combined with the Luby transform codes can significantly reduce the overall execution time. HCMM is found to be up to 61%, 46%, and 36% faster than the aforementioned three benchmark schemes, respectively. Additionally, we provide a generalization to the problem of optimal load allocation in heterogeneous settings, where we take into account the monetary costs associated with distributed computing clusters. We argue that HCMM is asymptotically optimal for budget-constrained scenarios as well. In particular, we characterize the minimum possible expected cost associated with a computation task over a given cluster of machines. Furthermore, we develop a heuristic algorithm for (HCMM) load allocation for the distributed implementation of budget-limited computation tasks.

Posted Content
TL;DR: MLPerf as discussed by the authors is an ML benchmark that overcomes three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time-to-solution exhibits high variance.
Abstract: Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits high variance, and software and hardware systems are so diverse that fair benchmarking with the same binary, code, and even hyperparameters is difficult. We therefore present MLPerf, an ML benchmark that overcomes these challenges. Our analysis quantitatively evaluates MLPerf's efficacy at driving performance and scalability improvements across two rounds of results from multiple vendors.

Posted Content
TL;DR: ParaDnn is introduced, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected, convolutional (CNN), and recurrent (RNN) neural networks, and the rapid performance improvements that specialized software stacks provide for the TPU and GPU platforms are quantified.
Abstract: Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six real-world models, we benchmark Google's Cloud TPU v2/v3, NVIDIA's V100 GPU, and an Intel Skylake CPU platform. We take a deep dive into TPU architecture, reveal its bottlenecks, and highlight valuable lessons learned for future specialized system design. We also provide a thorough comparison of the platforms and find that each has unique strengths for some types of models. Finally, we quantify the rapid performance improvements that specialized software stacks provide for the TPU and GPU platforms.

Journal ArticleDOI
TL;DR: Experimental results indicate that in terms of robustness, stability and quality of the solution obtained, AGDE is significantly better than, or at least comparable to state-of-the-art approaches.
Abstract: This paper presents adaptive guided differential evolution algorithm (AGDE) for solving global numerical optimization problems over continuous space. In order to utilize the information of good and bad vectors in the DE population, the proposed algorithm introduces a new mutation rule. It uses two random chosen vectors of the top and the bottom 100p% individuals in the current population of size NP while the third vector is selected randomly from the middle [NP-2(100p %)] individuals. This new mutation scheme helps maintain effectively the balance between the global exploration and local exploitation abilities for searching process of the DE. Besides, a novel and effective adaptation scheme is used to update the values of the crossover rate to appropriate values without either extra parameters or prior knowledge of the characteristics of the optimization problem. In order to verify and analyze the performance of AGDE, Numerical experiments on a set of 28 test problems from the CEC2013 benchmark for 10, 30, and 50 dimensions, including a comparison with classical DE schemes and some recent evolutionary algorithms are executed. Experimental results indicate that in terms of robustness, stability and quality of the solution obtained, AGDE is significantly better than, or at least comparable to state-of-the-art approaches.

Posted Content
TL;DR: This IEEE PES Task Force report proposes a standardized AC-OPF mathematical formulation and the PGLib-OPf networks for benchmarking AC-opF algorithms and a motivating study demonstrates some limitations of the established network datasets in the context of benchmarking ASF algorithms.
Abstract: In recent years, the power systems research community has seen an explosion of novel methods for formulating the AC power flow equations. Consequently, benchmarking studies using the seminal AC Optimal Power Flow (AC-OPF) problem have emerged as the primary method for evaluating these emerging methods. However, it is often difficult to directly compare these studies due to subtle differences in the AC-OPF problem formulation as well as the network, generation, and loading data that are used for evaluation. To help address these challenges, this IEEE PES Task Force report proposes a standardized AC-OPF mathematical formulation and the PGLib-OPF networks for benchmarking AC-OPF algorithms. A motivating study demonstrates some limitations of the established network datasets in the context of benchmarking AC-OPF algorithms and a validation study demonstrates the efficacy of using the PGLib-OPF networks for this purpose. In the interest of scientific discourse and future additions, the PGLib-OPF benchmark library is open-access and all the of network data is provided under a creative commons license.

Journal ArticleDOI
TL;DR: Considering the accuracy, false alarms, size, and running time of the proposed CNN based system, it is believed that it is a suitable candidate for fire detection in uncertain IoT environment for mobile and embedded vision applications during surveillance.
Abstract: Tactile Internet can combine multiple technologies by enabling intelligence via mobile edge computing and data transmission over a 5G network. Recently, several convolutional neural networks (CNN) based methods via edge intelligence are utilized for fire detection in certain environment with reasonable accuracy and running time. However, these methods fail to detect fire in uncertain Internet of Things (IoT) environment having smoke, fog, and snow. Furthermore, achieving good accuracy with reduced running time and model size is challenging for resource constrained devices. Therefore, in this paper, we propose an efficient CNN based system for fire detection in videos captured in uncertain surveillance scenarios. Our approach uses light-weight deep neural networks with no dense fully connected layers, making it computationally inexpensive. Experiments are conducted on benchmark fire datasets and the results reveal the better performance of our approach compared to state-of-the-art. Considering the accuracy, false alarms, size, and running time of our system, we believe that it is a suitable candidate for fire detection in uncertain IoT environment for mobile and embedded vision applications during surveillance.