scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Staleness and Stagglers in Distibuted Deep Image Analytics

TL;DR: In this paper, the staleness among the worker nodes is identified as the main cause of stragglers and different methods used to address this issue are described in detail and open research problems in this field are also highlighted.
Abstract: Deep learning for image analytics is widely used in many real-world applications. Due to the rapid growth in data and model size there is a need to distribute the models in multiple nodes. Distributed computing of the model helps to increase the scalability, training time and its cost effectiveness. But the distribution can lead to longer computation times in case of stale nodes. The computational time of the distributed nodes are affected by many factors like latency caused dur to communication, network connectivity, resource sharing, computational power etc. The main problem faced in case of distribution is the staleness among the worker nodes. Effect of stragglers cannot be completely avoided in distributed clusters. The failures in storage, disks, imbalanced workloads, resources sharing etc. are the main cause of stragglers. Stragglers can cause longer computation time and reduce the performance of the model. The different methods used to address this issue is described in the paper in detail. The open research problems in this field are also highlighted.
Citations
More filters
Book ChapterDOI
21 Oct 2022
TL;DR: In this paper , the deployment of HPC accelerators for CNN and how acceleration is achieved is discussed, and the leading cloud platforms used in computer vision for acceleration are also listed.
Abstract: Image processing combined with computer vision is creating a vast breakthrough in many research, industry-related, and social applications. The growth of big data has led to the large quantity of high-resolution images that can be used in complex applications and processing. There is a need for rapid image processing methods to find accurate and faster results for the time-crucial applications. In such cases, there is a need to accelerate the algorithms and models using the HPC systems. The acceleration of these algorithms can be obtained using hardware accelerators like GPU, TPU, FPGA, etc. The GPU and TPU are mainly used for the parallel implementation of the algorithms and processing them parallelly. The acceleration method and hardware selection are challenging since numerous accelerators are available, requiring deep knowledge and understanding of the algorithms. This chapter explains the deployment of HPC accelerators for CNN and how acceleration is achieved. The leading cloud platforms used in computer vision for acceleration are also listed.
References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Abstract: Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.

16,962 citations

Proceedings Article
03 Dec 2012
TL;DR: This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

3,475 citations

BookDOI
01 Nov 1990
TL;DR: The juxtaposition of these two expressions in the title reflects the ambition of the authors to produce a reference work, both for engineers who use adaptive algorithms and for probabilists or statisticians who would like to study stochastic approximations in terms of problems arising from real applications.
Abstract: Adaptive systems are widely encountered in many applications ranging through adaptive filtering and more generally adaptive signal processing, systems identification and adaptive control, to pattern recognition and machine intelligence: adaptation is now recognised as keystone of "intelligence" within computerised systems. These diverse areas echo the classes of models which conveniently describe each corresponding system. Thus although there can hardly be a "general theory of adaptive systems" encompassing both the modelling task and the design of the adaptation procedure, nevertheless, these diverse issues have a major common component: namely the use of adaptive algorithms, also known as stochastic approximations in the mathematical statistics literature, that is to say the adaptation procedure (once all modelling problems have been resolved). The juxtaposition of these two expressions in the title reflects the ambition of the authors to produce a reference work, both for engineers who use these adaptive algorithms and for probabilists or statisticians who would like to study stochastic approximations in terms of problems arising from real applications. Hence the book is organised in two parts, the first one user-oriented, and the second providing the mathematical foundations to support the practice described in the first part. The book covers the topcis of convergence, convergence rate, permanent adaptation and tracking, change detection, and is illustrated by various realistic applications originating from these areas of applications.

2,212 citations